[RFC PATCH 2/4] Add NOTMUCH_MESSAGE_FLAG_EXCLUDED flag

Mark Walters markwalters1009 at gmail.com
Tue Jan 24 03:20:26 PST 2012


On Mon, 23 Jan 2012 21:45:21 -0500, Austin Clements <amdragon at MIT.EDU> wrote:
> The overall structure of this series looks great.  There's obviously a
> lot of clean up to do, but I'll reply with a few high-level comments.
> 
> Quoth Mark Walters on Jan 24 at  1:18 am:
> > Form excluded doc_ids set and use that to exclude messages.
> > Should be no functional change.
> > 
> > ---
> >  lib/notmuch-private.h |    1 +
> >  lib/query.cc          |   28 ++++++++++++++++++++++++++--
> >  2 files changed, 27 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
> > index 7bf153e..e791bb0 100644
> > --- a/lib/notmuch-private.h
> > +++ b/lib/notmuch-private.h
> > @@ -401,6 +401,7 @@ typedef struct _notmuch_message_list {
> >   */
> >  struct visible _notmuch_messages {
> >      notmuch_bool_t is_of_list_type;
> > +    notmuch_doc_id_set_t *excluded_doc_ids;
> >      notmuch_message_node_t *iterator;
> >  };
> >  
> > diff --git a/lib/query.cc b/lib/query.cc
> > index c25b301..92fa834 100644
> > --- a/lib/query.cc
> > +++ b/lib/query.cc
> > @@ -57,6 +57,11 @@ struct visible _notmuch_threads {
> >      notmuch_doc_id_set_t match_set;
> >  };
> >  
> > +static notmuch_bool_t
> > +_notmuch_doc_id_set_init (void *ctx,
> > +			  notmuch_doc_id_set_t *doc_ids,
> > +			  GArray *arr);
> > +
> >  notmuch_query_t *
> >  notmuch_query_create (notmuch_database_t *notmuch,
> >  		      const char *query_string)
> > @@ -173,6 +178,7 @@ notmuch_query_search_messages (notmuch_query_t *query)
> >  						   "mail"));
> >  	Xapian::Query string_query, final_query, exclude_query;
> >  	Xapian::MSet mset;
> > +	Xapian::MSetIterator iterator;
> >  	unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
> >  			      Xapian::QueryParser::FLAG_PHRASE |
> >  			      Xapian::QueryParser::FLAG_LOVEHATE |
> > @@ -193,8 +199,21 @@ notmuch_query_search_messages (notmuch_query_t *query)
> >  
> >  	exclude_query = _notmuch_exclude_tags (query, final_query);
> >  
> > -	final_query = Xapian::Query (Xapian::Query::OP_AND_NOT,
> > -					 final_query, exclude_query);
> > +	enquire.set_weighting_scheme (Xapian::BoolWeight());
> > +	enquire.set_query (exclude_query);
> > +
> > +	mset = enquire.get_mset (0, notmuch->xapian_db->get_doccount ());
> > +
> > +	GArray *excluded_doc_ids = g_array_new (FALSE, FALSE, sizeof (unsigned int));
> > +
> > +	for (iterator = mset.begin (); iterator != mset.end (); iterator++)
> > +	{
> > +	    unsigned int doc_id = *iterator;
> > +	    g_array_append_val (excluded_doc_ids, doc_id);
> > +	}
> > +	messages->base.excluded_doc_ids = talloc (query, _notmuch_doc_id_set);
> > +	_notmuch_doc_id_set_init (query, messages->base.excluded_doc_ids,
> > +				  excluded_doc_ids);
> 
> This might be inefficient for message-only queries, since it will
> fetch *all* excluded docids.  This highlights a basic difference
> between message and thread search: thread search can return messages
> that don't match the original query and hence needs to know all
> potentially excluded messages, while message search can only return
> messages that match the original query.
> 
> It's entirely possible this doesn't matter because Xapian probably
> still needs to fetch the full posting lists of the excluded terms, but
> it would be worth doing a quick/hacky benchmark to verify this, with
> enough excluded messages to make the cost non-trivial.
> 
> If it does matter, you could pass in a flag indicating if the exclude
> query should be limited by the original query or not.  Or you could do
> the limited exclude query in notmuch_query_search_messages and a
> separate open-ended exclude query in notmuch_query_search_threads.

Yes I will benchmark that: I am just importing a large archive into
notmuch for testing. 

> >  	enquire.set_weighting_scheme (Xapian::BoolWeight());
> >  
> > @@ -294,6 +313,11 @@ _notmuch_mset_messages_move_to_next (notmuch_messages_t *messages)
> >      mset_messages = (notmuch_mset_messages_t *) messages;
> >  
> >      mset_messages->iterator++;
> > +
> > +    while ((mset_messages->iterator != mset_messages->iterator_end) &&
> > +	   (_notmuch_doc_id_set_contains (messages->excluded_doc_ids,
> > +					  *mset_messages->iterator)))
> > +	mset_messages->iterator++;
> 
> This seemed a little weird, since you remove it in the next patch.  Is
> this just to keep the tests happy?  (If so, it would be worth
> mentioning in the commit message; other reviewers will definitely have
> the same question.)

Essentially just to keep tests happy: or rather to try and make it easy
for a reviewer to see that the individual patch does not make any
functional change.

Best wishes 

Mark




More information about the notmuch mailing list