Notmuch indexing 21 million emails

Austin Clements amdragon at MIT.EDU
Tue Nov 22 19:20:03 PST 2011


Quoth Tom Bulli on Nov 21 at  7:02 pm:
> I have a project where I need to search about 21 emails - and
> decided to use "notmuch" for it.  The system is a Debian Squeeze,
> the notmuch version is "0.8-1~bpo60+1" from "kyria's" private
> repository.
> 
> I am running the "notmuch new" for approx. 4 days now - and
> according to "not,uch count" it has indexed about 4.5 million
> emails.
> 
> Is this expected performance?  Is there any way to speed that up?

Currently, notmuch is much more optimized for search than it is for
indexing.  This is unfortunate for the initial indexing process and
seems to be becoming increasingly unfortunate.

There are some things you can try.  One is to use an SSD if you aren't
already, since constructing the index requires a lot of random IO.
You can also try libeatmydata to disable fsync's, which may improve
your IO performance, with the obvious crash-safety caveats.  However,
unless you have a lot of RAM, I suspect your index has long outgrown
your buffer cache, so this may have limited impact.

Since you're going to the trouble of indexing 21 million emails, you
might want to try 0.10 (under freeze right now, to be released very,
very soon).  It won't improve your indexing time, but if you're doing
searches with non-trivial numbers of results, emails indexed with 0.10
will search much faster.

Sorry I don't have better news, but I hope this helps.


More information about the notmuch mailing list