storing From and Subject in xapian

Austin Clements amdragon at mit.edu
Sat May 14 18:37:25 PDT 2011


I wonder if a better approach would be to use
notmuch_message_get_header everywhere, rather than introducing
_notmuch_message_get_header_value, and have it simply recognize
headers that can be retrieved directly from the database.  Then
library callers could take advantage of this optimization and it could
be trivially extended to other headers in the future.

On Tue, May 3, 2011 at 11:40 PM, Istvan Marko <notmuch at kismala.com> wrote:
> I have been looking at the I/O patterns of "notmuch search" with the
> default output format and noticed that it has to parse the maildir file
> of every matched message to get the From and Subject headers. I figured
> that this must be slowing things down, especially when the files are not
> in the filesystem cache.
>
> So I wanted to see how much difference would it make to have the From
> and Subject stored in xapian to avoid this parsing.
>
> With the attached patch I get a speedup of 2x with cached and almost 10x
> with uncached files for searches with many matches.
>
> The attached patch is only intended as proof of concept. I am not
> familiar with xapian so I wasn't sure if this kind of data should be
> stored as terms, values or data. I went with values simply because I saw
> that message-id and timestamp were already stored that way. Perhaps the
> data type would be more appropriate since the fields are not used for
> searching or sorting. Oh and for some reason I get blank Subject for
> about 1% of the matches.
>
>
> Is there a downside to this approach? The only one I see is that the
> xapian db size increases by about 1% but to me the speed increase would
> be well worth it.


More information about the notmuch mailing list