Alternative (raw) message store (i.e. instead of maildir)

Ciprian Dorin Craciun ciprian.craciun at gmail.com
Tue Aug 14 10:05:11 PDT 2012


On Tue, Aug 14, 2012 at 7:50 PM, Vladimir Marek
<Vladimir.Marek at oracle.com> wrote:
>>     On the other hand I strongly sustain having a more optimized
>> backend for emails, especially for such cases. For example a
>> BerkeleyDB would perfectly fit such a use case, especially if we store
>> the body and the headers in separate databases.
>>
>>     Just a small experiment, below are the R `summary(emails)` of the
>> sizes of my 700k emails:
>> ~~~~
>>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
>>        8     4364     5374    11510     7042 31090000
>> ~~~~
>>
>>     As seen 75% of the emails are below 7k, and this without any compression...
>>
>>     Moreover we could organize the keys so that in a B-Tree structure
>> the emails in the same thread are closer together...
>
> Now I'm not sure if you talk about some berkeley-db fuse filesystem or
> direct support in notmuch.

    No tricks. :)

    I proposed -- better said queried if possible or at least wanted
-- to have an internal interface (SPI) that any mail store would have
to implement in order to be indexed and used by notmuch. I guess the
interface would be quite lightweight, and would need just the
following:
    * open store;
    * create a cursor iterating through all the emails, yielding only the keys;
    * read the envelope (as a byte blob) of a particular key; (used
only for displaying thread lists, etc.;)
    * read the body (as a byte blob) of a particular key;
    * maybe create a cursor iterating over all those emails that have
changed since a particular timestamp;


> I don't have enough cycles to modify notmuch,
> so I started to look at simpler (codewise) solution ...
>
> To summarize, what I personally want from the mail storage

    We need to make a distinction between current storage (like
maildir) and archival storage (like the Zip or my proposal).


> - ability to read and write mails

    It could be done through a small CLI over the proposed API.

> - should work with mutt (or mutt-kz)

    This would eliminate any proposal not involving a FUSE wrapper...

> - simple backup to windows drive (files can't contain double colon ':')

    This could be done via a dump like facility. (BerkeleyDB supports
this natively through a tool.)


More information about the notmuch mailing list