On disk tag storage format

David Bremner david at tethera.net
Wed Feb 20 17:29:30 PST 2013


David Bremner <david at tethera.net> writes:

> Austin outlined on IRC a way of representing tags on disk as hardlinks
> to messages. In order to make the discussion more concrete, I wrote a
> prototype in python to dump the notmuch database to this format. On my
> 250k messages, this creates 40k new hardlinks, and uses about 5M of
> diskspace. The dump process takes about 20s on
> my core i7 machine.  With symbolic links, the same database takes about
> 150M of disk space; this isn't great but it isn't unbearable either.
>

I've being playing a bit with this script and it seems more or less
usable as a way of mirroring the notmuch tag database to a link farm.

It's a bit faster than my current dump/restore based approach, although
if you want to keep the results in a git repository then it takes up
more space. Of course the bonus with this approach is that it creates
"virtual" maildirs for each tag that can be browsed with the maildir
client of choice.

The current default is to use some mix of hard and symbolic links to try
to balance the space consumed in a git repo versus the inode
consumption/performance issues of using too many symlinks.

It's still a prototype, and there is not much error checking, and there
are certain issues not dealt with at all (the ones I thought about are
commented).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: linksync.py
Type: text/x-python
Size: 5194 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20130220/351bd585/attachment.py>


More information about the notmuch mailing list