[PATCH] dump: Don't sort.

Thomas Schwinge thomas at schwinge.name
Mon Nov 28 13:04:14 PST 2011


Hi!

First, thanks to David, Tomi, Tom for moving this forward.


On Sat, 19 Nov 2011 16:11:13 +0100, Petter Reinholdtsen <pere at hungry.com> wrote:
> [Thomas Schwinge]
> > +    /* This used to use NOTMUCH_SORT_MESSAGE_ID.  On 2011-10-29, a measurement
> > +     * on a 372981 messages instance showed that wall time can be reduced from
> > +     * 28 minutes (sorted by Message-ID) to 15 minutes (unsorted), the latter
> > +     * being much more ``database-disk-layout-friendly''.  Subsequently sorting
> > +     * the 25 MiB of data is a no-brainer, if required.  */

Here is the measurement re-done -- I discovered that while doing the
former, there had been parallel work been done in another Xen domU on
that system, disturbing the measurement.

Discard caches, every time before dumping:

    $ sync; sleep 3; echo -n 3 | sudo dd of=/proc/sys/vm/drop_caches

Original (sorted by Message-ID):

    $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
    26.41user 16.56system 14:34.81elapsed 4%CPU (0avgtext+0avgdata 167152maxresident)k
    2994440inputs+55896outputs (41major+11627minor)pagefaults 0swaps

Unsorted:

    $ \time notmuch dump | sort > ~/tmp/Mail-notmuch_dump/dump
    24.79user 3.86system 12:00.22elapsed 3%CPU (0avgtext+0avgdata 57216maxresident)k
    2929192inputs+0outputs (40major+4942minor)pagefaults 0swaps

The difference is no longer as big as before, but still better than
nothing.

> This sound like a great idea for my use case.  Doing 'notmuch dump'
> with my 1.2 million emails take hours at the moment (not very fast
> encrypted file system), and result in a 90 MiB dump file.

... and you will gain most by putting the .notmuch directory onto a SSD,
as I have done by now:

Original (sorted by Message-ID), with .notmuch on SSD:

    $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
    24.86user 13.40system 1:06.01elapsed 57%CPU (0avgtext+0avgdata 167200maxresident)k
    2992184inputs+55920outputs (49major+11622minor)pagefaults 0swaps

Unsorted, with .notmuch on SSD:

    $ \time notmuch dump > ~/tmp/Mail-notmuch_dump/dump
    21.90user 2.68system 0:51.70elapsed 47%CPU (0avgtext+0avgdata 57248maxresident)k
    2926912inputs+55920outputs (50major+4934minor)pagefaults 0swaps

User and system time (roughly) remain the same, but the wall time drops
considerably -- a SSD at its best, obviously.


Generally speaking, I decided it was enough to just put the .notmuch
directory onto the SSD, and not the whole mail store: if new messages are
added (notmuch new), they're still in the page cache anyway (having been
retrieven via POP3 or whatever just before), and for regular message read
access, a HDD's seek time shouldn't matter too much (and I've taken
notice of Austin's patches which even retrieven Subject: etc. from the
DB), so what remains to be optimized is random access to the DB.


Grüße,
 Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20111128/9b7e457c/attachment-0001.pgp>


More information about the notmuch mailing list