Questions about importing mail (mbox)

Mueen Nawaz mueen at nawaz.org
Mon Mar 21 19:02:45 PDT 2011


Pieter Praet <pieter at praet.org> writes:
> It would've been a no-brainer if you'd been using Maildir all along
> (mbox is evil incarnate), but...

Sure, but mbox is too convenient.

> I'd suggest keeping your original mbox file safe in git [1], and
> consistently commiting every step of the way, so even if messages were
> to get lost in translation, you still have a way to get them back, with
> negligible storage overhead (just remember to "git gc --aggressive
> --prune=now" when you're finished).

I think you misunderstood me. A part of me suspects this has something
to do with my not explaining myself, but who's to say?<G>

I'm experimenting with notmuch, and if I can translate everything I
currently do in mutt to notmuch, then I'll just dump mutt. The set of
mboxes I have will remain archived, but for all future incoming email,
I'll switch to MH or MailDir. So I don't actually need to put my old
mboxes under revision control - I just need to save them somewhere.

> For the actual conversion to Maildir (and any type of mail fetching in
> general), I'd suggest using FDM [2], you'll never look back.

Thanks - will take a look.

> Regarding the significant discrepancy between processed and added files
> in Notmuch: Could be dupes (e.g. mail to/cc/bcc yourself or mailing
> lists, ending up in both Inbox and Sent), which are automatically
> suppressed by Notmuch.

It definitely was dupes. I didn't realize that notmuch did not keep
track of dupes. 

So I wrote a Python script to go through the mboxes and do a count of
only unique messages. Problem? I have over 1000 emails that don't have a
Message-ID header (case invariant search). I could go over why that is,
but suffice it to say that I hate Microsoft.<G>

Once I remove all dupes, I get to within 300-400 of the count that
notmuch provides. The remaining 1000+ emails do contain some dupes, and
I can't find a convenient way to get an accurate count of unique emails
from them, but at least now I'm in the ballpark, and a lot more
confident.

Incidentally, one reason I didn't realize dupes were the reason is that
I did a search for a word in one email I had and notmuch did not find
it - so I assumed it had not been indexed. Later on, I realized I had
written a partial word and discovered that notmuch does find it if I
type the full word.

What am I doing wrong? Can't notmuch handle partial word matches? Do I
need to specify an option to get that to work?

Anyway, thanks for the help - I'll investigate further.




More information about the notmuch mailing list