Improved search speed by 2.5X

Tue Nov 16 08:18:43 PST 2010

'Lo.  I've been toying with switching to notmuch for a while, but the
speed of search has been holding me back.  Perhaps I should upgrade
the archaic machine I'm trying to run notmuch on, but I figured
optimizing search would be a good way for me to cut my teeth on
notmuch's code.  I've implemented two general optimizations that,
together, make thread search 2.5X faster, reducing my test "tag:inbox"
query from 4.523 seconds to 1.779 seconds.

Before cleaning up and posting the patches, I wanted to make sure my
general approach is both correct and pedagogically sound.

I reduced thread search's 1 + 2t Xapian queries (where t is the number
of unique matched threads) to 1 + t queries and now construct exactly
one notmuch_message_t for each message instead of 2 to 3.  The query
performed by notmuch_query_search_threads to get all messages matching
the user's query now fetches only the docid set instead of
constructing message objects and decompressing the thread ID of every
message from its term list.  Threads are then constructed from these
docids using "thread:" queries much as before; however, now which
subset of these messages matched the user query is determined by
checking their docids against the docid set constructed earlier,
rather than requiring yet another Xapian query per thread to find
matched messages.

Before this optimization, my test "tag:inbox" query took 4.523
seconds.  With the optimization, it takes 3.072 seconds (1.5X faster).
It has the downside that it requires more RAM, though even with, say,
a 1 million message database, it's at most ~4.2 megs.  In principle it
also introduces latency before the first search result, but fetching
docid sets is what Xapian was born to do and I haven't noticed any
latency.

The second optimization is based on the observation that Xapian
decompresses a document's term list every time you iterate over it.
As a result, notmuch can decompress the beginning of a single term
list at least five times.  I combined all of the message metadata
fetching (message ID, thread ID, reply-to, filename list, and tag
list) into a single pass over the term list that fetches everything.

This optimization reduces my "tag:inbox" query to 3.521 seconds (1.3X
faster).  This one is a bit more of a balancing act, since it may
fetch metadata that is never needed; however, doing the single pass
takes only a little longer than running any of the individual metadata
requests.

These two optimizations complement each other.  With both, my query
takes 1.779 seconds (2.5X faster).  Because the first constructs only
one message object per message, it aggregates all metadata operations
on that one object instead of spreading them across multiple objects,
increasing the effectiveness of single-pass metadata retrieval.

Does this all seem reasonable?  My code passes the test suite [1], so
I believe it to be fairly sound.

[1] Except for 2 emacs tests that depend on author order.  What order
are matched authors *supposed* to be in?

-- 
Austin Clements                                      MIT/'06/PhD/CSAIL
amdragon at mit.edu                           http://web.mit.edu/amdragon
       Somewhere in the dream we call reality you will find me,
              searching for the reality we call dreams.