Notmuch Email Corpus
A corpus of about 209k messages is available for performance testing of notmuch (or other uses).
The contents are as follows
Mail/notmuch-archive
: archive of the notmuch mailing list.last updated 2012-11-17
converted from mbox with mb2md 3.20.
Mail/enron
: selected data from the EDRM v2 enron data setCC Attribution: "ZL Technologies, Inc. (http://www.zlti.com)"
Downloaded via bittorrent
http://www.searchdaimon.com/community/dataset/
massaged with scripts/unpack-enron.sh (in the corpus tarball)
Mail/lkml
: lkml messages 1000000 to 1100000 from the gmane archive
The corpus is gpg signed by David Bremner with key fingerprint:
7A18 807F 100A 4570 C596 8420 7E4E 65C8 720B 706B
You can download the corpus from