index multiple files per message-id, add reindex command

David Bremner david at tethera.net
Thu Apr 13 20:14:38 PDT 2017


WARNING: reindexing is an intrusive operation. I don't think this will
corrupt your database, but previous versions thrashed threading pretty
well. notmuch-dump is your friend.

[PATCH 01/10] lib: isolate n_d_add_message and helper functions into
[PATCH 02/10] lib/n_d_add_message: refactor test for new/ghost
[PATCH 03/10] lib: factor out message-id parsing to separate file.
[PATCH 04/10] lib: refactor notmuch_database_add_message header

The first 4 patches are just code movement. database.cc has gotten to
large to understand (for me), so this is mainly trying to group
functions together in some logical way.

[PATCH 06/10] lib: index message files with duplicate message-ids

the diff here has grown a bit, but the idea is still simple: add terms
and values for all files with a given message id.

[PATCH 07/10] WIP: Add message count to summary output

This patch gives the user some hints about the existance of multiple
files per message-id.

[PATCH 08/10] lib: add _notmuch_message_remove_indexed_terms

this just iterates over terms, and kills any that are recoverable

[PATCH 09/10] lib: add notmuch_message_reindex

this is the trickiest code here, and it ends up using several of the
functions called by notmuch_database_add_message, rather than calling
it directly.

[PATCH 10/10] add "notmuch reindex" subcommand

This should probably have at least a few more tests: in particular
preservation of message properties is not tested yet. Also, more tests
involving threading are needed, since it turned out to surprisingly
hard to trigger some bugs (i.e. there were bugs triggered only by one
of the two corpora, and only by one of xapian 1.2 vs 1.4).

The good news is that there really seems to be a speed payoff for this
extra complication. reindexing all messages went from about twice as
long the initial notmuch new, to about 60% of that speed.

I'm a little skeptical about the peak memory use, but so far I didn't
see any serious looking memory leaks.


More information about the notmuch mailing list