thread ordering based on references and/or in-reply-to

Austin Clements amdragon at mit.edu
Wed Nov 2 07:37:05 PDT 2011


On Mon, Oct 31, 2011 at 7:07 PM, Florian Friesdorf <flo at chaoflow.net> wrote:
>
> Hi,
>
> I'm looking into taking the References header into account for thread
> ordering. So far only In-Reply-To is used. My C/C++ is rusty at best, so
> I'd need some help to get this done.
>
> Carl gave a try on irc already to clear things up for me, reading into
> it, I have more questions:
>
> lib/thread.cc/_resolve_thread_relationships adds messages as replies to
> a parent.
>
> Currently, we seem to treat In-Reply-To as empty or single msgid. If I
> understand rfc822 it can be a list of msgids and/or phrases. Do/shall we
> support that?
>
> References is a list of msgids, with the last one being the direct
> parent. I don't know how multiple direct parents are handled here.
>
> DJB recommends "... readers look for identifiers in In-Reply-To and
> append them to References if they are not already included in
> References." [1]
>
> In that case if there are two msgids in In-Reply-To and there are
> appended to the References list, than only the last one will be a parent
> and the one that used to be the last is not a parent anymore.
>
> And Carl recommends to treat references and in-reply-to as two separated
> sources of information, first using in-reply-to and then references in
> order "to attach to the deepest referenced parent".
>
> I fail to understand that. Am I complicating things?
> How do we want to treat the combination of References/In-Reply-To?
>
> Do we have code that returns the last msgid listed in references?
> database.cc/parse_references seems not to care about order, just
> existence - or is GHashTable ordered.
>
> [1] http://cr.yp.to/immhf/thread.html
>
>
> florian

I know this came up on IRC, but have you looked at jwz's threading
algorithm (http://www.jwz.org/doc/threading.html)?  Carl mentioned
that notmuch already implements it (except for subject matching), but
notmuch only implements the subset of it necessary to group messages
into threads without structure.  Much of the algorithm is devoted to
exactly this problem of piecing together the thread structure based on
all of the information in both In-Reply-To and References.  The
algorithm as described combines the issues of grouping and structuring
since it's expecting a giant pile of mail as input, but there's no
reason these can't be teased apart.


More information about the notmuch mailing list