`notmuch-escape-boolean-term': Broken for non-ascii characters

Moritz Ulrich moritz at tarn-vedra.de
Tue Aug 12 14:47:42 PDT 2014


"Austin T. Clements" <aclements at csail.mit.edu> writes:

> Quoting Moritz Ulrich <moritz at tarn-vedra.de>:
>> Hello,
>>
>> I recently adopted notmuch as my primary way to read mail, so thank you
>> for this great tool!
>>
>> Unfortunately, I ran into a problem of the Emacs side of the project
>> when used in a non-ascii environment:
>>
>> Having a tag named 'uni-köln', the tag:-completion doesn't work.
>>
>> This is caused by `notmuch-escape-boolean-term' errornously escaping the
>> above string:
>>
>> (notmuch-escape-boolean-term "uni-köln") => "\"uni-köln\""
>>
>> This is caused by `string-match' with the following errornously matching
>> my tag:
>>
>> (string-match "[^!#-'*-~]" "uni-köln") => 5
>> (string-match "[^!#-'*-~]" "uni-koln") => nil
>>
>> I'm not exactly sure how to tackle this - the Regexp was crafted to match
>> (, ), " if I understand it correct. A simple way would be just adding
>> more characters as a sort-of whitelist. A nicer solution would be
>> converting it from [^...] to [...] to explicitly mark letters that needs
>> to be escaped.
>
> notmuch-escape-boolean-term used to use a blacklist, but we switched
> to a whitelist because Xapian's own parser has changed over the years
> in its handling of non-ASCII characters and invalidated our blacklist.
> Ultimately it seemed much safer to go with a whitelist.  Quoting
> "uni-köln" isn't erroneous, it's just conservative.
>
> Could you explain in more detail what's broken?  I tried adding the
> tag uni-köln to a message in Emacs, then hitting "s" to start a search
> then "tag:<TAB>" and that tag (surrounded by quotes) was one of the
> completion options.  Upon completing to that tag, the search worked
> fine.
>
> Are you objecting to the unnecessary (but legal) quotes in the
> completion?  We might be able to include Unicode word characters in
> the quoting whitelist, though that seems like a spot fix (probably a
> fairly broad one, so maybe that's fine) and might be tricky because of
> Emacs' somewhat weird Unicode regexp support (using [[:alpha:]] might
> Just Work, but we'd have to be careful of the active syntax table).
> Or tab completion could recognize that, say, tag:uni doesn't require
> quoting, but still expand it to tag:"uni-köln".

Thanks for explaining the reason for the whitelist-approach. Knowing
this is quite helpful.

I can't really explain why, but I just didn't notice tag:"uni-köln" in
the tag-completion - I think my expectations for finding it as
tag:uni-köln must have blinded me.

While it isn't errornous, it's higly unintuitive to quote tags like
this. I can understand that a much more permissive whitelist could cause
other problems which are harder to track down, so maybe it's possible to
make the behavior configurable (e.g. by using a `defvar' for the regex).

-- 
Moritz Ulrich
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20140812/bec926a6/attachment.pgp>


More information about the notmuch mailing list