[PATCH] dump: Don't sort.
Thomas Schwinge
thomas at schwinge.name
Mon Nov 28 13:04:14 PST 2011
Hi!
First, thanks to David, Tomi, Tom for moving this forward.
On Sat, 19 Nov 2011 16:11:13 +0100, Petter Reinholdtsen <pere at hungry.com> wrote:
> [Thomas Schwinge]
> > + /* This used to use NOTMUCH_SORT_MESSAGE_ID. On 2011-10-29, a measurement
> > + * on a 372981 messages instance showed that wall time can be reduced from
> > + * 28 minutes (sorted by Message-ID) to 15 minutes (unsorted), the latter
> > + * being much more ``database-disk-layout-friendly''. Subsequently sorting
> > + * the 25 MiB of data is a no-brainer, if required. */
Here is the measurement re-done -- I discovered that while doing the
former, there had been parallel work been done in another Xen domU on
that system, disturbing the measurement.
Discard caches, every time before dumping:
$ sync; sleep 3; echo -n 3 | sudo dd of=/proc/sys/vm/drop_caches
Original (sorted by Message-ID):
$ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
26.41user 16.56system 14:34.81elapsed 4%CPU (0avgtext+0avgdata 167152maxresident)k
2994440inputs+55896outputs (41major+11627minor)pagefaults 0swaps
Unsorted:
$ \time notmuch dump | sort > ~/tmp/Mail-notmuch_dump/dump
24.79user 3.86system 12:00.22elapsed 3%CPU (0avgtext+0avgdata 57216maxresident)k
2929192inputs+0outputs (40major+4942minor)pagefaults 0swaps
The difference is no longer as big as before, but still better than
nothing.
> This sound like a great idea for my use case. Doing 'notmuch dump'
> with my 1.2 million emails take hours at the moment (not very fast
> encrypted file system), and result in a 90 MiB dump file.
... and you will gain most by putting the .notmuch directory onto a SSD,
as I have done by now:
Original (sorted by Message-ID), with .notmuch on SSD:
$ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
24.86user 13.40system 1:06.01elapsed 57%CPU (0avgtext+0avgdata 167200maxresident)k
2992184inputs+55920outputs (49major+11622minor)pagefaults 0swaps
Unsorted, with .notmuch on SSD:
$ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
21.90user 2.68system 0:51.70elapsed 47%CPU (0avgtext+0avgdata 57248maxresident)k
2926912inputs+55920outputs (50major+4934minor)pagefaults 0swaps
User and system time (roughly) remain the same, but the wall time drops
considerably -- a SSD at its best, obviously.
Generally speaking, I decided it was enough to just put the .notmuch
directory onto the SSD, and not the whole mail store: if new messages are
added (notmuch new), they're still in the page cache anyway (having been
retrieven via POP3 or whatever just before), and for regular message read
access, a HDD's seek time shouldn't matter too much (and I've taken
notice of Austin's patches which even retrieven Subject: etc. from the
DB), so what remains to be optimized is random access to the DB.
Grüße,
Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20111128/9b7e457c/attachment-0001.pgp>
More information about the notmuch
mailing list