storing From and Subject in xapian

Istvan Marko notmuch at kismala.com
Tue May 3 20:40:45 PDT 2011


I have been looking at the I/O patterns of "notmuch search" with the
default output format and noticed that it has to parse the maildir file
of every matched message to get the From and Subject headers. I figured
that this must be slowing things down, especially when the files are not
in the filesystem cache.

So I wanted to see how much difference would it make to have the From
and Subject stored in xapian to avoid this parsing. 

With the attached patch I get a speedup of 2x with cached and almost 10x
with uncached files for searches with many matches.

The attached patch is only intended as proof of concept. I am not
familiar with xapian so I wasn't sure if this kind of data should be
stored as terms, values or data. I went with values simply because I saw
that message-id and timestamp were already stored that way. Perhaps the
data type would be more appropriate since the fields are not used for
searching or sorting. Oh and for some reason I get blank Subject for
about 1% of the matches.


Is there a downside to this approach? The only one I see is that the
xapian db size increases by about 1% but to me the speed increase would
be well worth it.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: notmuch-xapian-headers.patch
Type: text/x-patch
Size: 4003 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110503/999cf11b/attachment.bin>
-------------- next part --------------

-- 
	Istvan


More information about the notmuch mailing list