[notmuch] Mail in git
Stewart Smith
stewart at flamingspork.com
Wed Feb 17 02:07:28 PST 2010
On Wed, 17 Feb 2010 11:21:51 +1100, Stewart Smith <stewart at flamingspork.com> wrote:
> Using fast-import is interesting. Does it update the working tree? The
> big thing I wanted to avoid was creating a working tree (another million
> inodes being created is not ever what I need)
>
> Also interesting is the mention of creating packs on the fly... this
> could save the time in first writing the object and then packing it (as
> my script does).
>
> I'm going to play with this....
and I did.
good news... on my mailstore (which, as I've previously mentioned, takes
about 10 minutes to run 'du' over, about the same time as 'notmuch new'
takes):
using the (attached) evenless.pl to create a single commit with
everything in it:
$ du -sh .git
3.4G .git
Down from a whopping 14-15GB!!!
My previous effort (git-write-object, create pack every 1000 messages,
rinse, repeat) took all night and got to 3.7GB.
This took only 108 minutes.
In both cases, i was creating the repository on another spindle (USB2.0
disk attached to my laptop).
git-ls-tree and git-cat-file both work for listing and getting objects.
The next thing to think about is adding objects as they come
in... creating a new commit with just an added file should be pretty
simple and easy... but this means we get to keep a "revision history" of
the mailstore, which is *possibly* not ideal in terms of storage
efficiency (i'll do a trial with mine of doing one message at a time and
seeing what the end size is).
however... commit per added mail (or mails) does give us the advantage
of a really well documented and tested backup system :)
Deleting could be hard.. if we actually want the objects to go away in a
"permanent" way (not just no longer be referenced).
for the stats nerds:
$ time perl /home/stewart/evenless/evenless.pl /home/stewart/Maildir/INBOX
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects: 785000
Total objects: 781813 ( 79023 duplicates )
blobs : 781363 ( 79023 duplicates 708627 deltas)
trees : 449 ( 0 duplicates 0 deltas)
commits: 1 ( 0 duplicates 0 deltas)
tags : 0 ( 0 duplicates 0 deltas)
Total branches: 1 ( 1 loads )
marks: 1048576 ( 860386 unique )
atoms: 860557
Memory total: 182780 KiB
pools: 152116 KiB
objects: 30664 KiB
---------------------------------------------------------------------
pack_report: getpagesize() = 4096
pack_report: core.packedGitWindowSize = 1073741824
pack_report: core.packedGitLimit = 8589934592
pack_report: pack_used_ctr = 1
pack_report: pack_mmap_calls = 1
pack_report: pack_open_windows = 1 / 1
pack_report: pack_mapped = 388496447 / 388496447
---------------------------------------------------------------------
real 107m43.130s
user 45m25.430s
sys 2m49.440s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: evenless.pl
Type: text/x-perl
Size: 1413 bytes
Desc: evenless.pl: maildir to git using fast-import
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20100217/bc1a3f34/attachment.pl>
-------------- next part --------------
--
Stewart Smith
More information about the notmuch
mailing list