how to search for hyphenated words? (was: how to search for Morse code?)

Matt Armstrong marmstrong at google.com
Wed Mar 13 11:23:34 PDT 2019


David Bremner <david at tethera.net> writes:

> Matt Armstrong <marmstrong at google.com> writes:
>
>> Carl Worth <cworth at cworth.org> writes:
>>
>>> Hi Gregor,
>>>
>>> The trick here is that when notmuch is indexing body text it feeds it
>>> into a Xapian function that parses the text by finding "terms" in the
>>> text. And this parser considers both punctuation and whitespace as
>>> separators between terms.
>>
>> I notice that Xapian supports something called "phrase searches",
>> documented as:
>>
>>   "A phrase surrounded with double quotes ("") matches documents
>>   containing that exact phrase. Hyphenated words are also treated as
>>   phrases, as are cases such as filenames and email addresses
>>   (e.g. /etc/passwd or president at whitehouse.gov)."
>>
>> I assume that this particular Xapian feature is unavailable in notmuch?
>> If so, I wonder if enabling has ever been considered?
>
> It is enabled, and documented in notmuch-search-terms(7). Unfortunately
> I don't think it's related to the original request. The mention of
> hyphenated words is about the input to the query parser, not the
> (necessarily) the retrieved text.

Ah, so it boils down to the Xapian definition of "exact phrase."
Notably, "exact phrase" is not "identical sequence of characters" as
some people might expect.

Quick tests with various search engines reveal their phrase search as
operating the same way.  E.g. searching for "org notmuch" finds all
sorts of results:

  org-notmuch.el
  notmuchmail.org/notmuch-emacs/
  to:devicetree at vger.kernel.org notmuch tag +inbox +unread -new
  (require 'org-notmuch nil t)
  https://notmuchmail.org/notmuch-emacs/. *
  imaps://mail.example.org/Notmuch/search

For what it is worth, one thing I've taken to doing is using period
separators in the notmuch phrase searches I use in scripts and even
interactively.  Using periods is generally immune to confusing issues
related to quoting double quoted things, and always remains a single
shell "word."  They are also, most often, clearly not the exact content
I'm searching for, so they make it clear than the match algorithm is
inexact.  E.g.

  subject:notmuch.is.wonderful

instead of:

  subject:"notmuch is wonderful"


More information about the notmuch mailing list