Added messages / total files count difference.

Tomi Ollila tomi.ollila at nixu.com
Wed Aug 10 01:41:54 PDT 2011


On Tue 09 Aug 2011 14:02, Tomi Ollila <tomi.ollila at nixu.com> writes:

> Hi
>
> I get this output:
>
> $ notmuch new --verbose
> Found 15559 total files (that's not much mail).
> Processed 15559 total files in 5m 53s (43 files/sec.).
> Added 15546 new messages to the database.
>
> $ find * -type f | wc
>   15559   15559  529027
>
> How can I determine which 13 files were dropped. All of those
> 15559 files should be mails. I tried to check through mail files that
> have no 'Subject:' header but those were (at least one) indexed. Could
> it be about duplicate Message-ID: or something ?
>
> $ notmuch --version
> notmuch 0.7-7-g68e8560

It is about duplicate Message-ID:s

It would be nice that 'notmuch new' printes information about this
if this were to happen (as I recall it does when new file found
is not (considered as) a mail file).

The steps I took to figure this out (not all iterations with & without
'wc':s shown) at the end of this email.

>
> Tomi

Tomi

--8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--

$ find ~/mail/mails/* -type f | sort >! filenames-fs
$ wc filenames-fs 
 15559  15559 855766 filenames-fs

$ cd /path/to/notmuch-git/bindings/python
$ cat > foo.py
import notmuch
db = notmuch.Database()
msgs = notmuch.Query(db,'').search_messages()

for f in msgs:
    print f.get_filename()

$ PYTHONPATH=/path/to/python-json:`pwd` python foo.py | sort > filenames-db
$ wc filenames-db
 15546  15546 855037 filenames-db

$ diff filenames-db filenames-fs | grep mails | wc
     13      26     755

$ cd ~/mail
$ cat >midcheck.pl
use strict;
use warnings;

my %msgids;

foreach (<mails/*/*>) {
    my $fn = $_;
    my $mid;
    open I, '<', $fn or die $!;
    while (<I>) {
        $mid = $1, next if /^Message-ID:\s*(.*)/i;
        last if /^$/;
    }
    close I;
    unless ($mid) {
        print "$fn: no Message-ID (in same line with header tag?)\n";
        next;
    }
    my $fn0 = $msgids{$mid};
    if (defined $fn0) {
        print "Files '$fn0' and '$fn' have same msg id: $mid\n";
    }
    else {
        $msgids{$mid} = $fn;
    }
}

$ perl midcheck.pl | wc
     13     117    2098
$ perl midcheck.pl | grep \^Files | wc
     13     117    2098


More information about the notmuch mailing list