SpamAssassin or: why can't I search for »[«?

Carl Worth cworth at cworth.org
Fri Oct 29 16:18:31 PDT 2010


On Wed, 16 Jun 2010 13:55:10 +0200, Albin Stjerna <albin at eval.nu> wrote:
> I've been trying to get notmuch to apply the tag »spam« to these mails,
> but it seems I can neither make it search for upper-case letters nor
> »[«/»]«. My current solution is to tag everything with »spam« somewhere
> in the subject header as spam, which leads to lots of false positives.

Hi Albin,

I'm sorry that nobody answered this fairly simple question of yours
earlier.

What's happening here is that Xapian (the indexer used by notmuch) looks
for "word characters" and "non-word characters" that separate
words[*]. Then, the words are indexed (with numeric information
indicating their position) and the separators are thrown away. So
there's no way to search for separators such as »[«/»]«.

As for case-sensitivity, Xapian does provide capabilities such that
notmuch could offer optional case-sensitive searching. But that might
require more storage space than notmuch is currently using. It would
also require us to add some syntax to the search terms so that a user
can request case-sensitive searches.

Meanwhile, for a long-term fix for your problem, we plan to add the
ability to allow you to use notmuch to search for a header such as
"X-Spam-Flag: YES". This isn't currently possible, but when we implement
that, it should be much more reliable than finding flagged spam by
looking for words in the subject.

> Also, and much less importantly, is there any way to have notmuch
> harvest email addresses for BBDB?

We haven't written code to do the "insinuate" into bbdb thing by
default, but someone could do that. Early in my use of notmuch I wrote
some scripts that ran notmuch commands, grepped out addresses, and
stuffed them into bbdb. That was nice at first, but not usable in the
long-term since the database didn't grow as new addresses appeared in
emails.

More recently, we've added support to do tab-based completion of
addresses based on automatic searching through your notmuch mail store,
(rather than something external like bbdb). This is quite nice, but
currently a bit of effort to setup. See the "how to get email address
completion" instructions here:

	http://notmuchmail.org/emacstips/

In the future, I'd like to get this address completion working by
default without requiring the download of an additional tool, (like the
current notmuch_addresses.py or addrlookup programs). Having more direct
support for address completion within the notmuch database itself will
make it faster as well, (the current tools are grubbing through actual
mail files to find complete addresses).

-Carl

[*] I'm sure I'm using the wrong terminology for Xapian, and I might
have some details wrong, but the basic idea is hopefully correct.

-- 
carl.d.worth at intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20101029/76a47413/attachment.pgp>


More information about the notmuch mailing list