[PATCH] notmuch: Add "maildir:" search option

Austin Clements aclements at csail.mit.edu
Tue Nov 12 11:31:25 PST 2013


I think this is a great idea.  Personally I think this is how folder:
should work.  I find the semantics of folder: to be useless except where
they happen to coincide with the boolean semantics used here.
Unfortunately, changing folder: would require versioning the database,
which we have only primordial support for right now.

Various comments below, though nothing major.  Of course, we'd also need
some tests and man page updates for this.

On Tue, 12 Nov 2013, Peter Zijlstra <peterz at infradead.org> wrote:
> Subject: notmuch: Add "maildir:" search option
>
> The current "folder:" search terms are near useless when you have
> recursive folders, introduce a boolean "maildir:" search term to
> exactly match the maildir folder.
>
> Given a Maildir++ layout like:
>
> 	~/Maildir/
> 	~/Maildir/cur
> 	~/Maildir/new
> 	~/Maildir/tmp
> 	~/Maildir/.Sent
> 	~/Maildir/.Sent/cur
> 	~/Maildir/.Sent/new
> 	~/Maildir/.Sent/tmp
> 	~/Maildir/.INBOX.LKML
> 	~/Maildir/.INBOX.LKML/cur
> 	~/Maildir/.INBOX.LKML/new
> 	~/Maildir/.INBOX.LKML/tmp
> 	~/Maildir/.INBOX.LKML.netdev
> 	~/Maildir/.INBOX.LKML.netdev/cur
> 	~/Maildir/.INBOX.LKML.netdev/new
> 	~/Maildir/.INBOX.LKML.netdev/tmp
> 	~/Maildir/.INBOX.LKML.arch
> 	~/Maildir/.INBOX.LKML.arch/cur
> 	~/Maildir/.INBOX.LKML.arch/new
> 	~/Maildir/.INBOX.LKML.arch/tmp
>
> This patch generates the following search index:
>
> 	$ delve -a Maildir/.notmuch/xapian/ | ~/s XMAILDIR
> 	XMAILDIR:INBOX
> 	XMAILDIR:INBOX/LKML
> 	XMAILDIR:INBOX/LKML/arch
> 	XMAILDIR:INBOX/LKML/netdev
> 	XMAILDIR:Sent
>
> Which allows one (me!!1) to pose queries like:
>
> 	maildir:INBOX and not tag:list
>
> to more easily find offlist mail (from people like my family who don't
> actually send their stuff over LKML :-).
>
> Signed-off-by: Peter Zijlstra <peterz at infradead.org>
> ---
>
> XXX: now I need to go figure out how to do searches like:
>
>   subject:PATCH/0
>
> which would mandate that PATCH is the first word occurring in the
> subject. I think the position index holds enough information but I
> need to look into that and obviously the query parser needs work for
> this.

This is a frequently requested feature.  Unfortunately, there are two
technical hurdles.  One is that the position information actually
*isn't* enough as it is both because the subject will start at some
arbitrary term offset that depends on the from and to (and maybe other
things) and because the Xapian Query structure doesn't expose a way to
search for a specific term offset (only relative offsets).  The other is
that we currently use Xapian's query parser, which isn't extensible,
though I took a stab at a custom query parser long ago and swear that
one of these days I'll return to it.

>
>
>  lib/database.cc |  7 ++++---
>  lib/message.cc  | 40 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 3 deletions(-)
>
> diff --git a/lib/database.cc b/lib/database.cc
> index a021bf17253c..53aeaa68954d 100644
> --- a/lib/database.cc
> +++ b/lib/database.cc
> @@ -208,15 +208,16 @@ static prefix_t BOOLEAN_PREFIX_EXTERNAL[] = {
>      { "thread",			"G" },
>      { "tag",			"K" },
>      { "is",			"K" },
> -    { "id",			"Q" }
> +    { "id",			"Q" },
> +    { "maildir",		"XMAILDIR:" },

No colon.  That is, the term prefix should just be "XMAILDIR".

>  };
>  
>  static prefix_t PROBABILISTIC_PREFIX[]= {
>      { "from",			"XFROM" },
>      { "to",			"XTO" },
>      { "attachment",		"XATTACHMENT" },
> -    { "subject",		"XSUBJECT"},
> -    { "folder",			"XFOLDER"}
> +    { "subject",		"XSUBJECT" },
> +    { "folder",			"XFOLDER" },

Unintentional whitespace change?

>  };
>  
>  const char *
> diff --git a/lib/message.cc b/lib/message.cc
> index 1b4637950f8e..45a727a6208f 100644
> --- a/lib/message.cc
> +++ b/lib/message.cc
> @@ -22,6 +22,7 @@
>  #include "database-private.h"
>  
>  #include <stdint.h>
> +#include <string.h>
>  
>  #include <gmime/gmime.h>
>  
> @@ -485,6 +486,8 @@ _notmuch_message_add_filename (notmuch_message_t *message,
>      notmuch_status_t status;
>      void *local = talloc_new (message);
>      char *direntry;
> +    char *maildir;
> +    int i;
>  
>      if (filename == NULL)
>  	INTERNAL_ERROR ("Message filename cannot be NULL.");
> @@ -507,6 +510,43 @@ _notmuch_message_add_filename (notmuch_message_t *message,
>      /* New terms allow user to search with folder: specification. */
>      _notmuch_message_gen_terms (message, "folder", directory);
>  
> +    /* Convert the directory into a maildir path */
> +    maildir = talloc_strdup(local, directory);

Add a space before the "(".  (Same thing throughout the rest of the
patch.)

> +
> +    /* Strip the maildir "cur", "new" directory entries. */
> +    i = strlen(maildir);
> +    if (strncmp(maildir + i - 3, "cur", 3) == 0 ||
> +	strncmp(maildir + i - 3, "new", 3) == 0) {

This is unsafe if directory is less than three characters, which I
believe could happen if the message is in the root mail directory (which
shouldn't happen with a well-formed maildir, but notmuch doesn't require
maildir, and, regardless, we should be defensive).

Also, we have a STRNCMP_LITERAL macro that we often use for comparisons
with string literals, but I'm good with this, too.

> +	    maildir[i - 3] = '\0';
> +	    i -= 3;
> +    }
> +
> +    /* Strip trailing '/' */
> +    while (maildir[i-1] == '/') {

This is also unsafe if maildir is the empty string.

Also, add spaces around the "-" (likewise on the next line).

> +	    maildir[i-1] = '\0';
> +	    i--;
> +    }
> +
> +    /* Strip leading '/' */
> +    while (maildir[0] == '/')
> +	    maildir++;
> +
> +    /* Strip leading '.' */
> +    while (maildir[0] == '.')

Why strip multiple dots?  (I'm not saying you shouldn't, I'm just
curious; and it may be worth explaining in the comment.)

> +	    maildir++;
> +
> +    /* Replace all remaining '.' with '/' */

I think this should only happen if there was a leading '.', indicating a
Maildir++ folder hierarchy.  A lot of people use the "file system"
Maildir layout, which just consists of nested directories where the
leaves are Maildirs (e.g., Dovecot's LAYOUT=fs,
http://wiki2.dovecot.org/MailboxFormat/Maildir#Directory_Structure).  In
this case, it's perfectly legitimate to have '.'s in folder names and it
would be confusing and unexpected to translate them to '/'s.  This would
also make this compatible with MH folders (which notmuch supports,
though admittedly it's unclear if anyone actually uses them).

> +    for (i = 0; maildir[i]; i++) {
> +	    if (maildir[i] == '.')
> +		    maildir[i] = '/';
> +    }
> +
> +    /* If there's no string left, we're the "INBOX" */
> +    if (maildir[0] == '\0')
> +	    maildir = talloc_strdup(local, "INBOX");
> +
> +    _notmuch_message_add_term (message, "maildir", maildir);
> +
>      talloc_free (local);
>  
>      return NOTMUCH_STATUS_SUCCESS;
> _______________________________________________
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch


More information about the notmuch mailing list