[RFC] [PATCH] lib/database.cc: change how the parent of a message is calculated
Aaron Ecay
aaronecay at gmail.com
Sun Mar 3 15:46:18 PST 2013
Hi Jani,
Thanks to you and Austin for the comments.
2013ko martxoak 1an, Jani Nikula-ek idatzi zuen:
>> I think the background is that RFC 822 defines In-Reply-To (and
>> References too for that matter) as *(phrase / msg-id), while RFC 2822
>> defines them as 1*msg-id. I'd like something about RFC 822 being
>> mentioned in the commit message.
>>
>> The problem in the gmane message you link to in
>> id:87liaa3luc.fsf at gmail.com is likely related to the FAQ item 05.26
>> "How do I fix a bogus In-Reply-To or missing References field?" in
>> the MH FAQ http://www.newt.com/faq/mh.html.
Likely yes. But I think notmuch should handle these messages, since
they are seen in the wild (and I don’t think you disagree with me on
this point?)
>>
>> As the comment for the function says, we explicitly avoid including
>> self-references. I think I'd err on the safe side and return NULL if
>> the last ref equals message-id.
Done.
>>
>> I don't know how you got this non-change hunk here, but please remove
>> it. :)
That’s what I get for setting my editor to delete trailing whitespace on
save (then not reading outgoing patches carefully). Fixed.
>> I wonder if you should reuse your parse_references() change here, so
>> you'd set in_reply_to_message_id to the last message-id in
>> In-Reply-To. This might tackle some of the problematic cases
>> directly, but should still be all right per RFC 2822. I didn't verify
>> how the parser handles an RFC 2822 violating free form header though.
>
> Strike that based on http://www.jwz.org/doc/threading.html:
>
> "If there are multiple things in In-Reply-To that look like
> Message-IDs, only use the first one of them: odds are that the later
> ones are actually email addresses, not IDs."
Hmm. I think it’s a toss-up which of multiple quasi-message-ids is the
real one. In the email message example I linked upthread, it was the
last one that was real. I decided to use the last one, because it
allows the self-reference checking to be pushed entirely into
parse_references. If you feel strongly that we should use the first
one, I can change it back.
> I talked to Austin (CC) about the patch on IRC, and his comment was,
> perceptive as always:
>
> 23:38 amdragon Is the logic in that patch equivalent to always using
> the last message ID in references unless there is no references
> header? Seems like it is, but in a convoluted way.
>
> And that's actually the case, isn't it? To make the code reflect that,
> you should use last_ref_message_id, and if that's NULL, fallback to
> in_reply_to_message_id.
Yes. Fixed.
>
>> I suggest adding an else if branch (or revamp the above if condition)
>> to tackle the missing In-Reply-To header:
>>
>> else if (!in_reply_to_message_id && last_ref_message_id) {
>> in_reply_to_message_id = last_ref_message_id; }
>
> Strike that, it should be the other way round.
Now that the self-reference check is in parse_references, the
conditional is much simpler.
One additional change I made in this version was to factor out 3 calls
to “notmuch_message_get_message_id (message)” into a variable inside the
_notmuch_database_link_message_to_parents function, for a small boost to
readability (and perhaps speed, depending on how clever the compiler is
I guess).
I also added tests – those are the first of two patches that will follow
this email, the second being the code to make them pass.
--
Aaron Ecay
More information about the notmuch
mailing list