subjects and duplicated message id's
David Bremner
david at tethera.net
Thu Dec 14 06:03:12 PST 2017
There are currently several somewhat related issues with notmuch's
handling of subject headers for messages with duplicate message-ids
(i.e. several files on disk with the same message id). These are all
reflections of the fact that we use a value slot for subjects in the
database message document (i.e. the database object keyed by the
message-id). Among other things, using a value slot is what makes
regular expression searching (and potentially sorting) by subject work.
When we have multiple files with the same message-id, but different
subjects (probably indicating a "real" mid collision).
1. The output of notmuch-show can be inconsistent with notmuch-search
- this is because show is reading from the lexicographically first
file, while show is reading the database value slot.
- in principle this could be fixed by making show read the value
slot; but then the subject might not be consistent with the rest of
the message content. Also, it looks like a bit of a pain to refactor
so all that sprinter code has database access.
- we could also force the value slot to have the lexico first files'
subject during indexing. This would be a bit fiddly, but localized.
It would have the surprising effect of having the subject updated
when new messages arrived.
2. Regular expression search doesn't work for subjects not in the value
slot.
- this could be fixed by putting all subjects in the value slot,
perhaps as newline seperated strings. This would also be a
potential solution for the "subject hiding" issue mentioned above,
although it would take some front-end effort as well to deal with
"multi-subjects". This could be reported in e.g. json output as an
array of subjects.
I'm open to other, better ideas of how to do this. I'm also curious how
important people think these bugs are.
d
More information about the notmuch
mailing list