how to search for hyphenated words? (was: how to search for Morse code?)

Carl Worth cworth at cworth.org
Fri Mar 8 16:03:02 PST 2019


Hi Gregor,

The trick here is that when notmuch is indexing body text it feeds it
into a Xapian function that parses the text by finding "terms" in the
text. And this parser considers both punctuation and whitespace as
separators between terms.

So your messages are not being indexed in a way to let you distinguish
between "org notmuch" and "org-notmuch".

(Of note, the query parser applies the same parsing to your query---so
that even when you think you're typing an exact phrase like
"org-notmuch" that gets parsed into separate terms "org" and "notmuch"
for searching.)

> all these resulted in very many hits most or all of which do not
> contain the string "org-notmuch", one found email was e.g.
>
> id:20180904105723.15564-3-david at tethera.net

That message does contain the following:

   +test_emacs '(notmuch-tree "id:000-real-root at example.org")
   +           (notmuch-test-wait)

Where you will notice that there's a term "org" followed (after some
punctuation and whitespace separators) by a term "notmuch".

> How would one search for hyphenated words with notmuch?

You would need to arrange to have the indexer consider the hyphen as a
letter-like character to be made part of terms. Or be extra clever and
index something like "notmuch-test-wait" in multiple ways (such as a
single term "notmuch-test-wait" as well as three adjacent terms
"notmuch", "test", and "wait" as notmuch is doing currently).

-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20190308/949f6a28/attachment.sig>


More information about the notmuch mailing list