[PATCH 3/4] Optimize thread search using matched docid sets.

Wed Dec 8 13:58:44 PST 2010

Quoth Carl Worth on Dec 07 at  5:19 pm:
> On Wed, 17 Nov 2010 14:28:26 -0500, Austin Clements <amdragon at MIT.EDU> wrote:
> > This reduces thread search's 1+2t Xapian queries (where t is the
> > number of matched threads) to 1+t queries and constructs exactly one
> > notmuch_message_t for each message instead of 2 to 3.
> 
> Fantastic stuff, Austin!
> 
> I've merged this now, (sorry it took me a while to get to it).
> 
> One of the reasons I didn't merge it immediately is that I wanted to
> ensure that I understood the original author-ordering bug. Basically,
> I'm inherently uncomfortable with a performance optimization that fixes
> a bug as a side effect, (unless we understand that very well).
> 
> So what I pushed actually adds the bug fix first, so that the
> performance optimization makes no change at all to the test suite. That
> feels better to me, (even though it simply demonstrated conclusively
> that the bug was in a piece of code that was eliminated by the
> optimization).

Ah, good.  You are less lazy than I.

> Anyway, in a quick reading of the code, the only little thing I saw was:
> 
> > +    size_t count = (bound + sizeof (doc_ids->bitmap[0]) - 1) /
> > +	sizeof (doc_ids->bitmap[0]);
> 
> Which would look better to my eyes with a 1 factored out of the
> division:
> 
> 	size_t count = 1 + (bound - 1) / sizeof (doc_ids->bitmap[0]);
> 
> And the repeated use of "sizeof (doc_ids->bitmap[0])" could maybe do
> with a macro for better legibility. Though it would be an evil macro if
> it didn't accept an argument, and it wouldn't be much shorter if it
> did. So maybe it's fine as-is.

I found what I think is a cleaner way to write that bit of code.  A
small patch is forthcoming.

> Thanks for the optimization. Now all we need is a little notmuch
> benchmark so that I can be sure not to regress any performance work with
> my sloppy coding!

Now that this is in (and I have a temporary respite from TA duties),
I'm going to finish up and send out my other ~1.7X improvement, just
to get it out of my queue.  Then I'll look at making a performance
regression suite.  Were you thinking of some standard set of timed
operations wrapped in a little script that can tell you if you've made
things worse, or something more elaborate?

Thanks for pushing these patches!