Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more)

Austin Clements amdragon at MIT.EDU
Wed Feb 2 22:14:29 PST 2011


Quoth Carl Worth on Feb 02 at  2:48 pm:
> Restricting my reply to one tiny bit of your mail:
> 
> You wrote:
> > non-recursive is the only thing that makes sense for Maildir++ folders
> 
> Either I'm not understanding Maildir++ folders, or I don't agree with
> you.
> 
> I might have an email archive that looks like this:
> 
>   Maildir
>     .work
>       .project1
>       .project2
>       .etc...
>     .family
>       .dad
>       .mom
>       .brother
>       .etc...
> 
> With the above setup, what would be unreasonable about wanting to search
> for all work-related messages (across all projects, say) with a string
> like "folder:work" ?
> 
> Now, a person might definitely want to search for messages in the
> ".work" folder directly, (not including the sub-folders), so we should
> provide support for users to get at that behavior as well, (such as a
> proposed "folder:work$" or so).
> 
> To me, both cases are perfectly legitimate, and I don't understand an
> argument that claims that only one makes sense. (Or again, I may be
> misunderstanding something.)

(Somebody with more first-hand Maildir++ experience should jump in here.
I stopped using Maildir++ a long time ago, so I may have no idea what
I'm talking about.)

Both cases are perfectly legitimate.

However, the issue with Maildir++ is that the inbox is stored in the
top-level directory:

  Maildir
    cur
    new
    tmp
    .work
    .work.project1

As a consequence, all folders are subfolders of the inbox.  With
recursive search, a search for your inbox folder returns *all* of your
messages.  I wasn't trying to say that we shouldn't support recursive
search (I'm all for flexibility), but it's a confusing default for
Maildir++ because of this.

Maildir++ has the added twist that the inbox folder has no name.  As a
result, currently notmuch can't search for a Maildir++ inbox folder,
which needs to be addressed somehow.  The least surprising approach
would compatibility with the Maildir++ convention of calling the
top-level folder INBOX, the subfolder INBOX.work, etc.


Maildir++ issues aside, I submit that rooted, non-recursive folder
searches are a more natural default with a more conventional syntactic
extension to non-rooted/recursive searches.  In
id:87aaiy3u65.fsf at yoom.home.cworth.org, you mentioned that you
implemented non-rooted folder search to mimic subject search.  But file
system paths are not natural language like subject lines.  File system
paths are hierarchical and rooted.

Of course, special query operators like ^ and $ can mitigate this, but
these queries *aren't* regexps and, furthermore, people don't usually
apply regexps to file names.  They apply globs.  Glob syntax has the
added benefit of congruity with Xapian wildcard syntax.  This naturally
leads to a rooted, non-recursive syntax by default (like globs), where a
* at the end means recursive and a * at the beginning means non-rooted.
In fact, we could easily generalize this to arbitrary shell globs.


Here's a proposal that, I think, addresses Maildir++ inboxes and
subfolders; rooted, non-rooted, recursive, and non-recursive queries;
and then some.  Plus, it wouldn't require many code changes; you've
already done the hard work.

Switch XFOLDER from a probabilistic prefix with word-splitting to a
boolean prefix without word-splitting.  When indexing, strip off the cur
or new and examine the resulting directory name.  If it's the mail root,
this is a Maildir++ inbox, so add the term XFOLDERINBOX.  If it starts
with a dot, it's a Maildir++ subfolder, so add the term
XFOLDERINBOX<.dirname>.  Otherwise, add the term XFOLDER<dirname>.
Then, using a custom query transform for the "folder:" prefix, enumerate
XFOLDER terms and form a synonym query out of those that fnmatch the
user's folder query.


More information about the notmuch mailing list