[notmuch] Mail in git

Stewart Smith stewart at flamingspork.com
Wed Feb 17 02:07:28 PST 2010


On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stewart at flamingspork.com> wrote:
> Using fast-import is interesting. Does it update the working tree? The
> big thing I wanted to avoid was creating a working tree (another million
> inodes being created is not ever what I need)
> 
> Also interesting is the mention of creating packs on the fly... this
> could save the time in first writing the object and then packing it (as
> my script does).
> 
> I'm going to play with this....

and I did.

good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'
takes):

using the (attached) evenless.pl to create a single commit with
everything in it:

$ du -sh .git
3.4G	.git

Down from a whopping 14-15GB!!!

My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.

This took only 108 minutes.

In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).

git-ls-tree and git-cat-file both work for listing and getting objects.

The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a "revision history" of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).

however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)

Deleting could be hard.. if we actually want the objects to go away in a
"permanent" way (not just no longer be referenced).

for the stats nerds:

$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX

git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     785000
Total objects:       781813 (     79023 duplicates                  )
      blobs  :       781363 (     79023 duplicates     708627 deltas)
      trees  :          449 (         0 duplicates          0 deltas)
      commits:            1 (         0 duplicates          0 deltas)
      tags   :            0 (         0 duplicates          0 deltas)
Total branches:           1 (         1 loads     )
      marks:        1048576 (    860386 unique    )
      atoms:         860557
Memory total:        182780 KiB
       pools:        152116 KiB
     objects:         30664 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit      = 8589934592
pack_report: pack_used_ctr            =          1
pack_report: pack_mmap_calls          =          1
pack_report: pack_open_windows        =          1 /          1
pack_report: pack_mapped              =  388496447 /  388496447
---------------------------------------------------------------------


real	107m43.130s
user	45m25.430s
sys	2m49.440s


-------------- next part --------------
A non-text attachment was scrubbed...
Name: evenless.pl
Type: text/x-perl
Size: 1413 bytes
Desc: evenless.pl: maildir to git using fast-import
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100217/bc1a3f34/attachment.pl>
-------------- next part --------------




-- 
Stewart Smith


More information about the notmuch mailing list