RFC: notmuch powered (personal) (end-to-end) e-mail system
Ciprian Dorin Craciun
ciprian.craciun at gmail.com
Sun Mar 20 07:07:50 PDT 2011
Hello all! (Sorry for the long email.)
I'm "struggling" for some time to get rid of the current
"de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
observed for some time the notmuch project and community.
Although I've forwarded all my email to a single account, and I'm
currently mirroring my GMail account locally (by using `mbsync`),
index it by using notmuch, and I collect spam mails for later filter
training, unfortunately I'm unable to "convert" because the current
notmuch-powered solutions have (some of) the following shortcomings (I
don't want to offend anyone, so please take these as observations):
* the most feature full UI is the Emacs one -- thus limited remote
access (I mean from an arbitrary computer with only a web-browser);
(and I'm not a very big fan of Emacs;)
* most are still dependent on external IMAP systems -- this is not
a problem with notmuch itself, but for the integrating clients;
* SPAM -- as above -- is not integrated;
* filtering (tag applying) is not automatic (as in integrated in
notmuch itself or the client), but triggered through external scripts;
As such I'm thinking on implementing a custom end-to-end email
system and I would like to hear your feedback before embarking on such
a task.
I'm targeting the following features:
* (inbound) SMTP integration, thus once an email is received it is
automatically pushed through the system; (I'm primarily targeting
those users that afford to run their own SMTP server; but the solution
could still be adapted for those that only want the other features;)
* automatic spam filtering, and tag applying;
* automatic email triggers based on tags (such as user
notifications, forwarding, etc.)
* remote RPC-like access to the whole system;
* remote Web user interface;
About the overall architecture I'm thinking on adopting the following:
* in general the whole system is decomposed in independent
components (long-lived OS daemons) that each one does a particular job
(see below);
* all the components communicate between each-other through a
message queue system (for example ZeroMQ or RabbitMQ);
* all the communication is JSON based;
The components would be:
* SMTP inbound gateway -- for example I could take qmail or
Postfix and replace the delivery agent with a custom process that
pushes the email into the system; (any other solution suggestions?);
* email store -- as the name suggests it is a simple
key-value-like store that should persist raw email-messages; it should
be as robust as possible, and its contents should be the only thing
needed to reconstruct all the other derived data; (I could use here a
simple process that maintains a maildir, I could go also with a
BerkeleyDB wrapper, or even something more sophisticated;)
* spam filter -- which either classifies the email or trains the
spam filter; (for example I would use bogofilter;)
* email index -- this is where notmuch would come into play; it
would be fed with emails, which it would automatically apply tags and
issue trigger notifications based on tags; it also maintains a set of
filters and tags to automatically apply;
* (maybe) a coordinator that should delegate and monitor requests
to the above components; but if I'm using RabbitMQ and carefully
designing the above components, they could drive each other;
* restful web service that would intermediate access to all the
above components;
For now I have the following uncertainties:
* how should I handle multiple users? I think each user should
have it's own store / notmuch / bogofilter instance (at least in terms
of storage if not even in terms of separate daemon);
* should I keep the emails is a file-system, or a key-value store?
(the file-system is more bug-free, but I'm confident that a BerkeleyDB
instance would be more efficient);
* should I use libnotmuch or for starters just make a notmuch tool wrapper;
* and the most pressing one, transactions: I would like that at no
point does a message get half processed or lost; as such I need
notmuch to behave transactionally -- indexing the message and tagging
it should be atomic and durable; (is there a way with libnotmuch to
control the underlaying BerkeleyDB database?)
Suggestions? Considerations?
Ciprian.
More information about the notmuch
mailing list