Notmuch new speed degradation

Thu Jul 24 15:31:45 PDT 2014

Quoth Dmitry Bogatov on Jul 24 at 11:49 pm:
> * Austin Clements <amdragon at MIT.EDU> [2014-07-24 10:32:14-0400]
> > Hi Dmitry.  My guess is that's you've exceeded your OS buffer cache
> > size by enough that most B-tree reads are going to disk at least once.
> > How big is your database (du -h $MAIL/.notmuch/xapian) and what does
> > free -h report on that computer?  Also, is this on an SSD or an HDD?
> 
> 13Gb on HDD, 9G after compact. Compact did not improved indexing speed,
> unfortunately. Maybe it is possible to somehow merge databases?

Unfortunately, there's no support for merging databases.  Other than
technical difficulties like identifying messages that should belong to
the same thread during merge, the schema wasn't designed with this in
mind and uses various features that are incompatible with merging.

There are some known problems with Xapian slowing down as the database
gets larger, but four seconds per message still sounds extreme.

Another thing to try is to raise Xapian's flush threshold by setting
the environment variable XAPIAN_FLUSH_THRESHOLD.  The default is
10000.  Try increasing it by, say, an order of magnitude (you can
probably go much higher than that, though you don't want to go too
high or you'll start eating in to the memory for your page cache).

>              total       used       free     shared    buffers     cached
> Mem:          7,7G       6,5G       1,2G       240M       826M       3,6G
> -/+ buffers/cache:       2,1G       5,6G
> Swap:         1,9G        66M       1,8G

Hmm.  Was this after the compact or after notmuch new had run for a
while?  1.2GB of free memory suggests that it's not a page cache
problem, but that would only apply if you took this snapshot after
notmuch new, not after compact.

We should confirm that this is an IO problem.  If you run
/usr/bin/time notmuch new for a few minutes, is the %CPU significantly
below 100%?  If it's above 90%ish, then this is a CPU problem and we
might be able to track it down using CPU profiling.  If it is an IO
problem (which is almost certainly is), I'm afraid it's much harder to
track down.

Also, what file system are you using?