On disk tag storage format

Eirik Byrkjeflot Anonsen eirik at eirikba.org
Thu Nov 29 11:34:50 PST 2012


David Bremner <david at tethera.net> writes:

> Austin outlined on IRC a way of representing tags on disk as hardlinks
> to messages. In order to make the discussion more concrete, I wrote a
> prototype in python to dump the notmuch database to this format. On my
> 250k messages, this creates 40k new hardlinks, and uses about 5M of
> diskspace. The dump process takes about 20s on
> my core i7 machine.  With symbolic links, the same database takes about
> 150M of disk space; this isn't great but it isn't unbearable either.

And eating 40k inodes, I suppose.  Which may matter to some systems.
(Hardlinks do not use extra inodes, as they are just directory entries
pointing to already existing inodes).

Of course, the space usage also depends on the file system, as e.g. ext2
would use 1 complete block (typically 4kiB) to store the file name
pointed to per symlink.  ReiserFS would probably use 5M for the
directory entries and another 5M for the symlink data (wild guess).

eirik


More information about the notmuch mailing list