storing From and Subject in xapian
Istvan Marko
notmuch at kismala.com
Tue May 3 20:40:45 PDT 2011
I have been looking at the I/O patterns of "notmuch search" with the
default output format and noticed that it has to parse the maildir file
of every matched message to get the From and Subject headers. I figured
that this must be slowing things down, especially when the files are not
in the filesystem cache.
So I wanted to see how much difference would it make to have the From
and Subject stored in xapian to avoid this parsing.
With the attached patch I get a speedup of 2x with cached and almost 10x
with uncached files for searches with many matches.
The attached patch is only intended as proof of concept. I am not
familiar with xapian so I wasn't sure if this kind of data should be
stored as terms, values or data. I went with values simply because I saw
that message-id and timestamp were already stored that way. Perhaps the
data type would be more appropriate since the fields are not used for
searching or sorting. Oh and for some reason I get blank Subject for
about 1% of the matches.
Is there a downside to this approach? The only one I see is that the
xapian db size increases by about 1% but to me the speed increase would
be well worth it.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: notmuch-xapian-headers.patch
Type: text/x-patch
Size: 4003 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110503/999cf11b/attachment.bin>
-------------- next part --------------
--
Istvan
More information about the notmuch
mailing list