Partial words on notmuch search?

Austin Clements amdragon at MIT.EDU
Tue Jan 17 11:47:15 PST 2012


Quoth Jani Nikula on Jan 17 at  7:43 pm:
> On Mon, 16 Jan 2012 21:34:31 -0500, Austin Clements <amdragon at MIT.EDU> wrote:
> > Quoth Andrei Popescu on Jan 16 at 10:21 pm:
> > > This is also interesting:
> > > $ notmuch count 'debian'
> > > 65888
> > > $ notmuch count 'dEbian'
> > > 65888
> > > $ notmuch count 'Debian'
> > > 65887
> > 
> > The first two will match stemmed versions of "debian" such as
> > "debian's" and "debianed".  However, starting a term with a capital
> > letter suppresses stemming (because it suggests that it's a name,
> > which you wouldn't want to modify), so your last query matches only
> > the term "debian".  This is probably documented somewhere, though I
> > don't know where.
> 
> Interesting. Is this done when adding the terms to the database, or when
> searching? I presume the latter. How much control does notmuch have over
> this?

This is getting a bit out of my depth, but I believe indexing is done
with both stemmed and unstemmed versions of all terms (if stemming is
enabled) so that search can use either.

For indexing, Notmuch can set the stemmer (or no stemmer).  Xapian
provides stemmers for a variety of languages:
  http://xapian.org/docs/apidoc/html/classXapian_1_1Stem.html#6c46cedf2047b159a7e4c9d4468242b1

For query parsing, Notmuch can set both the stemmer and a "stemming
strategy" that controls when it stems or doesn't stem terms:
  http://xapian.org/docs/apidoc/html/classXapian_1_1QueryParser.html#c7dc3b55b6083bd3ff98fc8b2726c8fd


More information about the notmuch mailing list