[Patch v4] lib: regexp matching in 'subject' and 'from'

Jani Nikula jani at nikula.org
Sun Jan 29 03:23:36 PST 2017

On Wed, 25 Jan 2017, David Bremner <david at tethera.net> wrote:
> Tomi Ollila <tomi.ollila at iki.fi> writes:
>> Why would not mesasge_id not be useful to regex match. I can come up quite
>> a few use cases... but if there are techinal difficulties... then that
>> should be mentioned instead.
> I'll have a look. Since the first version of this patch (when that
> message was written), people have actually asked for some kind of
> wildcard matching of message-ids.

Theoretically "/" is an acceptable character in message-ids [1]. Rare,
unlikely, but acceptable. Searching for message-id's beginning with "/"
would have to use regexps, which would break in all sorts of ways
throughout the stack. I don't think there are handy alternatives to
"/<regex>/", given the characters that are acceptable in message-ids,
but this is something to think about.

For example, could the regexp matcher for message-ids first check if the
"regexp" is a strict match with "/" and all, and accept those? This
might be a reasonable workaround if it can be made to work.

[1] https://tools.ietf.org/html/rfc2822#section-3.2.4

>> maybe this commit message should inform that xapian with field processors
>> (1.4.x) is required for this feature -- and emphasize it a bit better in
>> manual page ?
>> Probably '//' is used to escape '/' -- should such a character ever needed
>> in regex search.
> Currently no escaping is needed because it only looks at the first and
> last characters of the string (the usual xapian/shell rules mean that "" might
> be needed).
> The following seem to work as hoped
> # match a / with a space before it
> % notmuch search 'subject:"/ //"'
> # just a slash
> % notmuch search subject:///
> # anchored slash
> % notmuch search subject:/^//
> The trailing slash is actually decorative, we could drop it. Actually
> *blush* I just noticed the current code is missing something from this line
>          if (str.at (0) == '/' && str.at (str.size () - 1)){
> _if_ that line is fixed, then it will have the slightly odd behaviour of
> subject:/blah
> doing a non-regex search
> We could also throw an error for that case, maybe that's the best option.

I'd go with an error. It's easy to loosen the rules later on if we
decide that's a good idea. Much harder to accept loose rules now, let
users get used to it, and try to tighten the rules if we realize we'd
need that for some reason.


More information about the notmuch mailing list