storing From and Subject in xapian

Austin Clements amdragon at mit.edu
Wed May 4 18:48:39 PDT 2011


This is awesome.  What was your machine configuration?

As another data point, with a probably very different configuration (8
year old P4, new SSD), my test query was 1.9X faster uncached and 1.6X
faster cached.  It also produced 60% fewer disk reads.  I saw the same
1% increase in database size.

BTW, the reason you're missing some of the subjects is that the char*
returned from _notmuch_message_get_header_value goes out of scope as
soon as that function returns.  A simple fix is to replace
    return value.c_str();
with
    return talloc_strdup (message, value.c_str ());

Values are probably the right place to store this information (though
I've never been completely clear on the difference between document
data and values).  Terms would be indexed, which is both unnecessary
(unless there's a reason to do *exact* matches on from and subject?)
and would result in more database expansion.

On Tue, May 3, 2011 at 11:40 PM, Istvan Marko <notmuch at kismala.com> wrote:
>
> I have been looking at the I/O patterns of "notmuch search" with the
> default output format and noticed that it has to parse the maildir file
> of every matched message to get the From and Subject headers. I figured
> that this must be slowing things down, especially when the files are not
> in the filesystem cache.
>
> So I wanted to see how much difference would it make to have the From
> and Subject stored in xapian to avoid this parsing.
>
> With the attached patch I get a speedup of 2x with cached and almost 10x
> with uncached files for searches with many matches.
>
> The attached patch is only intended as proof of concept. I am not
> familiar with xapian so I wasn't sure if this kind of data should be
> stored as terms, values or data. I went with values simply because I saw
> that message-id and timestamp were already stored that way. Perhaps the
> data type would be more appropriate since the fields are not used for
> searching or sorting. Oh and for some reason I get blank Subject for
> about 1% of the matches.
>
>
> Is there a downside to this approach? The only one I see is that the
> xapian db size increases by about 1% but to me the speed increase would
> be well worth it.
>
>
>
>
> --
>        Istvan
>
> _______________________________________________
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
>
>


More information about the notmuch mailing list