[PATCH] notmuch: Add "maildir:" search option

Peter Zijlstra peterz at infradead.org
Tue Nov 12 11:47:27 PST 2013


On Tue, Nov 12, 2013 at 02:31:25PM -0500, Austin Clements wrote:
> > XXX: now I need to go figure out how to do searches like:
> >
> >   subject:PATCH/0
> >
> > which would mandate that PATCH is the first word occurring in the
> > subject. I think the position index holds enough information but I
> > need to look into that and obviously the query parser needs work for
> > this.
> 
> This is a frequently requested feature.  Unfortunately, there are two
> technical hurdles.  One is that the position information actually
> *isn't* enough as it is both because the subject will start at some
> arbitrary term offset that depends on the from and to (and maybe other
> things) and because the Xapian Query structure doesn't expose a way to
> search for a specific term offset (only relative offsets).  The other is
> that we currently use Xapian's query parser, which isn't extensible,
> though I took a stab at a custom query parser long ago and swear that
> one of these days I'll return to it.

Bah, I knew that would end up being more complex :-)

> >  lib/database.cc |  7 ++++---
> >  lib/message.cc  | 40 ++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 44 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/database.cc b/lib/database.cc
> > index a021bf17253c..53aeaa68954d 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
> >      { "thread",			"G" },
> >      { "tag",			"K" },
> >      { "is",			"K" },
> > -    { "id",			"Q" }
> > +    { "id",			"Q" },
> > +    { "maildir",		"XMAILDIR:" },
> 
> No colon.  That is, the term prefix should just be "XMAILDIR".

No, that colon is crucial; see http://xapian.org/docs/omega/termprefixes.html

"X starts a multi-capital letter user-defined prefix. If you want a
prefix for something without a standard prefix, you create your own
starting with an X (e.g. XSHOESIZE). The prefix ends with the first
non-capital. If the term you're prefixing starts with a capital, add a
":" between prefix and term to resolve ambiguity about where the prefix
ends and the term begins."

Since maildir folder names typically start with a capital we need that
':' in between the prefix and value. I tried, my initial versions didn't
have the ':' and reliably didn't work.

> >  };
> >  
> >  static prefix_t PROBABILISTIC_PREFIX[]= {
> >      { "from",			"XFROM" },
> >      { "to",			"XTO" },
> >      { "attachment",		"XATTACHMENT" },
> > -    { "subject",		"XSUBJECT"},
> > -    { "folder",			"XFOLDER"}
> > +    { "subject",		"XSUBJECT" },
> > +    { "folder",			"XFOLDER" },
> 
> Unintentional whitespace change?

Oops yes.

> > +    /* Convert the directory into a maildir path */
> > +    maildir = talloc_strdup(local, directory);
> 
> Add a space before the "(".  (Same thing throughout the rest of the
> patch.)

Urgh, weird coding style but yes you're right, I should've kept it
consistent with the rest of the file. Will fix.

> > +
> > +    /* Strip the maildir "cur", "new" directory entries. */
> > +    i = strlen(maildir);
> > +    if (strncmp(maildir + i - 3, "cur", 3) == 0 ||
> > +	strncmp(maildir + i - 3, "new", 3) == 0) {
> 
> This is unsafe if directory is less than three characters, which I
> believe could happen if the message is in the root mail directory (which
> shouldn't happen with a well-formed maildir, but notmuch doesn't require
> maildir, and, regardless, we should be defensive).
> 
> Also, we have a STRNCMP_LITERAL macro that we often use for comparisons
> with string literals, but I'm good with this, too.

Quite so, I haven't actually seen that, but you're quite right.

> > +	    maildir[i - 3] = '\0';
> > +	    i -= 3;
> > +    }
> > +
> > +    /* Strip trailing '/' */
> > +    while (maildir[i-1] == '/') {
> 
> This is also unsafe if maildir is the empty string.
> 
> Also, add spaces around the "-" (likewise on the next line).

Will fix.

> > +	    maildir[i-1] = '\0';
> > +	    i--;
> > +    }
> > +
> > +    /* Strip leading '/' */
> > +    while (maildir[0] == '/')
> > +	    maildir++;
> > +
> > +    /* Strip leading '.' */
> > +    while (maildir[0] == '.')
> 
> Why strip multiple dots?  (I'm not saying you shouldn't, I'm just
> curious; and it may be worth explaining in the comment.)

No reason, copy paste damage from above mostly.

> > +	    maildir++;
> > +
> > +    /* Replace all remaining '.' with '/' */
> 
> I think this should only happen if there was a leading '.', indicating a
> Maildir++ folder hierarchy.  A lot of people use the "file system"
> Maildir layout, which just consists of nested directories where the
> leaves are Maildirs (e.g., Dovecot's LAYOUT=fs,
> http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure).  In
> this case, it's perfectly legitimate to have '.'s in folder names and it
> would be confusing and unexpected to translate them to '/'s.  This would
> also make this compatible with MH folders (which notmuch supports,
> though admittedly it's unclear if anyone actually uses them).

Ah, I wasn't aware Dovecot actually supported this Maildir variant --
although I've recently come across this variant someplace and thought it
was odd.

OK, I can make it conditional on the initial leading dot.

I'll respin the patch and try to come up with manpage and test additions
to cover this new functionality.


More information about the notmuch mailing list