notmuch python bindings corrupt db index (was: gmail importer script)

Austin Clements amdragon at MIT.EDU
Fri Dec 14 22:18:06 PST 2012


Quoth Jason A. Donenfeld on Dec 13 at  3:32 pm:
> On Wed, Dec 12, 2012 at 9:49 PM, Austin Clements <amdragon at mit.edu> wrote:
> > There should be no way to corrupt the database at this level through
> > the Xapian API, which means nothing libnotmuch can do (much less users
> > of libnotmuch) should be able to corrupt the database.  If you can
> > reproduce the problem, it's probably a serious bug in Xapian, but it
> > could also have been a file system bug or even random file system
> > corruption.
> 
> Well that's... troubling.
> 
> Patrick: could you please backup and try to reproduce? Otherwise I'll
> assume this was a one-off situation.
> 
> 
> Austin-- think you could do a quick review of the script to double
> check and confirm I'm not doing anything nefarious?
> http://git.zx2c4.com/gmail-notmuch/tree/gmail-notmuch.py

In theory the only way you could cause corruption besides tickling a
bug would be to access the same database object concurrently from
different threads (since it's not thread-safe), but you don't appear
to be doing that.

I did spot something that could corrupt delivered email, though.  The
way you deliver to the Maildir is resilient to process termination,
but not to system failures such as power outages.  In particular, you
need to at least os.fsync before the os.link.  I'd recommend looking
at Python's mailbox module, which has a robust Maildir delivery
implementation (though it appears it doesn't let you control the file
name, so you probably can't use it directly).


More information about the notmuch mailing list