nomuch_addresses.py

Wed Feb 22 05:07:35 PST 2012

On Tue, 21 Feb 2012 11:33:38 -0500, Jesse Rosenthal <jrosenthal at jhu.edu> wrote:
> On Tue, 21 Feb 2012 14:53:06 +0100, Daniel Schoepe <daniel at schoepe.org> wrote:
> > On Tue, 21 Feb 2012 09:15:09 -0000, Justus Winter <4winter at informatik.uni-hamburg.de> wrote:
> > The reason I mentioned nottoomuch-addresses at all, is that completion
> > itself is _a lot_ faster (at least for me), compared to
> > addrlookup. According to the wiki, notmuch-addresses.py is even slower
> > than addrlookup, so I thought (and still think) that it was worth
> > mentioning. Of course, one could rewrite the database-generation part in
> > python using the bindings, but I personally don't think it's that
> > necessary.
> 
> I'm not sure what speed comparisons were being used -- I think it was
> Sebastian comparing vala to python. In any case, using
> notmuch_addresses.py to look up a common prefix ("Jes") on a slowish
> computer takes 0.2 seconds. So I'm not sure if the speed is all that
> much of an issue. It might be a question of cache temperature, though --
> it'll probably take longer the first time you run it. Still, even trying
> something out on a cold cache, it seems to be about a second.

The speed comparisons between vanilla notmuch_addresses.py and
nottoomuch-addresses.sh are going to be flawed in that they do different
things. It's comparing apples and oranges.

notmuch_addresses.py looks for matches in the recipients of mails the
user has sent. Nothing else. notmuch_addresses.py filters out multiple
names for one email address using a popularity contest.

AFAICT nottoomuch-addresses.sh scans all the addresses in all the
mails. It has no logic for filtering out multiple names for one email
address, and just returns all matches.

Personally I would like to have best of both worlds, and I'm using a
modified notmuch_addresses.py that matches all the mails I have, and
cleans up the duplicate results. Unfortunately that does have a toll on
performance, taking about a second on my system for typical searches,
cache hot, while nottoomuch-addresses.sh takes less than a tenth of a
second. It is enough to be annoying, I'm afraid. Even so, it's not a
fair comparison because notmuch_addresses.py wasn't designed with this
in mind, and nottoomuch-addresses.sh maintains its own database and does
less.

One just needs to pick the tool that fits the needs best.

BR,
Jani.