subjects and duplicated message id's

Daniel Kahn Gillmor dkg at fifthhorseman.net
Thu Dec 14 08:57:31 PST 2017


On Thu 2017-12-14 10:03:12 -0400, David Bremner wrote:
> There are currently several somewhat related issues with notmuch's
> handling of subject headers for messages with duplicate message-ids
> (i.e. several files on disk with the same message id).  These are all
> reflections of the fact that we use a value slot for subjects in the
> database message document (i.e. the database object keyed by the
> message-id).  Among other things, using a value slot is what makes
> regular expression searching (and potentially sorting) by subject work.
>
> When we have multiple files with the same message-id, but different
> subjects (probably indicating a "real" mid collision).
>
> 1. The output of notmuch-show can be inconsistent with notmuch-search
>
>    - this is because show is reading from the lexicographically first
>      file, while show is reading the database value slot.

you've got two "show"s here.  i think the second "show" is meant to be
"search".

>    - in principle this could be fixed by making show read the value
>      slot; but then the subject might not be consistent with the rest of
>      the message content. Also, it looks like a bit of a pain to refactor
>      so all that sprinter code has database access.
>
>    - we could also force the value slot to have the lexico first files'
>      subject during indexing. This would be a bit fiddly, but localized.
>      It would have the surprising effect of having the subject updated
>      when new messages arrived.

This is a bit weird, unless we also force "notmuch show" to always show
the lexicographically-first file as well, no?

> 2. Regular expression search doesn't work for subjects not in the value
>    slot.
>
>    - this could be fixed by putting all subjects in the value slot,
>      perhaps as newline seperated strings. This would also be a
>      potential solution for the "subject hiding" issue mentioned above,
>      although it would take some front-end effort as well to deal with
>      "multi-subjects".  This could be reported in e.g. json output as an
>      array of subjects.
>
> I'm open to other, better ideas of how to do this. I'm also curious how
> important people think these bugs are.

I think this is important to get right, thanks for raising it. I'll add
my own wrinkle below:

I'm looking at implementing "protected headers" (for recieving messages)
right now, where Subject: is the most important header that is typically
sent under encrypted cover (e.g. enigmail's "memory hole" implementation
does this).

With indexing of cleartext, we have a decision about which thing to put
in the value slot -- the "original" subject (that is, the subject that
the message sender wrote, which arrives inside the message encryption
for messages with protected headers) or the as-delivered external "stub"
header (typically the literal string "encrypted message").

Obviously, i'd prefer the original subject, so that it's searchable.
I'd also like it if "notmuch show" displayed the original subject when
showing the encrypted message.

However, if someone does "notmuch show --decrypt=false" i'd want it to
display the as-delivered header.

Does this help push you toward any specific decision?  I'm also not sure
what the correct solution is right now.

     --dkg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20171214/c0a8e8bc/attachment-0001.sig>


More information about the notmuch mailing list