[PATCH] dump: Don't sort the output by message id.

Tom Prince tom.prince at ualberta.net
Sun Nov 27 10:40:53 PST 2011


From: Thomas Schwinge <thomas at schwinge.name>

Asking xapian to sort the messages for us causes suboptimal IO patterns. This
would be useful, if we only wanted the first few results, but since we want
everything anyway, this is pessimization.

On 2011-10-29, a measurement on a 372981 messages instance showed that wall
time can be reduced from 28 minutes (sorted by Message-ID) to 15 minutes
(unsorted).

Timings on 189605 messages:

$ time notmuch.old dump
19.48user 5.83system 12:10.42elapsed 3%CPU (0avgtext+0avgdata 110656maxresident)k
3629584inputs+22720outputs (33major+7073minor)pagefaults 0swaps
$ echo 3 > /proc/sys/vm/drop_caches
$ time notmuch.new
14.89user 1.20system 3:23.58elapsed 7%CPU (0avgtext+0avgdata 46032maxresident)k
1256264inputs+22464outputs (43major+1990minor)pagefaults 0swaps
---
 This just moves the motivation to the commit message, and adds more detailed timing information.

 notmuch-dump.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/notmuch-dump.c b/notmuch-dump.c
index 126593d..0475eb9 100644
--- a/notmuch-dump.c
+++ b/notmuch-dump.c
@@ -73,7 +73,10 @@ notmuch_dump_command (unused (void *ctx), int argc, char *argv[])
 	fprintf (stderr, "Out of memory\n");
 	return 1;
     }
-    notmuch_query_set_sort (query, NOTMUCH_SORT_MESSAGE_ID);
+    /* Don't ask xapian to sort by Message-ID. Xapian optimizes returning the
+     * first results quickly at the expense of total time.
+     */
+    notmuch_query_set_sort (query, NOTMUCH_SORT_UNSORTED);
 
     for (messages = notmuch_query_search_messages (query);
 	 notmuch_messages_valid (messages);
-- 
1.7.6.1



More information about the notmuch mailing list