Added messages / total files count difference.
Tomi Ollila
tomi.ollila at nixu.com
Wed Aug 10 01:41:54 PDT 2011
On Tue 09 Aug 2011 14:02, Tomi Ollila <tomi.ollila at nixu.com> writes:
> Hi
>
> I get this output:
>
> $ notmuch new --verbose
> Found 15559 total files (that's not much mail).
> Processed 15559 total files in 5m 53s (43 files/sec.).
> Added 15546 new messages to the database.
>
> $ find * -type f | wc
> 15559 15559 529027
>
> How can I determine which 13 files were dropped. All of those
> 15559 files should be mails. I tried to check through mail files that
> have no 'Subject:' header but those were (at least one) indexed. Could
> it be about duplicate Message-ID: or something ?
>
> $ notmuch --version
> notmuch 0.7-7-g68e8560
It is about duplicate Message-ID:s
It would be nice that 'notmuch new' printes information about this
if this were to happen (as I recall it does when new file found
is not (considered as) a mail file).
The steps I took to figure this out (not all iterations with & without
'wc':s shown) at the end of this email.
>
> Tomi
Tomi
--8<----8<----8<----8<----8<----8<----8<----8<----8<----8<--
$ find ~/mail/mails/* -type f | sort >! filenames-fs
$ wc filenames-fs
15559 15559 855766 filenames-fs
$ cd /path/to/notmuch-git/bindings/python
$ cat > foo.py
import notmuch
db = notmuch.Database()
msgs = notmuch.Query(db,'').search_messages()
for f in msgs:
print f.get_filename()
$ PYTHONPATH=/path/to/python-json:`pwd` python foo.py | sort > filenames-db
$ wc filenames-db
15546 15546 855037 filenames-db
$ diff filenames-db filenames-fs | grep mails | wc
13 26 755
$ cd ~/mail
$ cat >midcheck.pl
use strict;
use warnings;
my %msgids;
foreach (<mails/*/*>) {
my $fn = $_;
my $mid;
open I, '<', $fn or die $!;
while (<I>) {
$mid = $1, next if /^Message-ID:\s*(.*)/i;
last if /^$/;
}
close I;
unless ($mid) {
print "$fn: no Message-ID (in same line with header tag?)\n";
next;
}
my $fn0 = $msgids{$mid};
if (defined $fn0) {
print "Files '$fn0' and '$fn' have same msg id: $mid\n";
}
else {
$msgids{$mid} = $fn;
}
}
$ perl midcheck.pl | wc
13 117 2098
$ perl midcheck.pl | grep \^Files | wc
13 117 2098
More information about the notmuch
mailing list