[notmuch] Missing messages breaking threads
Carl Worth
cworth at cworth.org
Fri Dec 18 11:41:18 PST 2009
On Fri, 18 Dec 2009 19:02:21 +0000, James Westby <jw+debian at jameswestby.net> wrote:
> I like the architecture of notmuch, and have just switched
> to using it as my primary client, so thanks.
You're quite welcome, James. Welcome to notmuch!
> Therefore I'd like to fix this. The obvious way is to
> introduce documents in to the db for each id we see, and
> threading should then naturally work better.
That sounds like a fine idea.
> The only issue I see with doing this is with mail delays.
> Once we do this we will sometimes receive a message that
> already has a dummy document. What happens currently with
> message-id collisions?
The current message-ID collision logic is pretty brain-dead. It just
says "Oh, I've seen a file with this message before, so I'll skip this
additional file".
But I'm just putting the finishing touches on a patch that instead does:
Oh, and here's an additional filename for that message ID. Add
that too, please.
Beyond that, all we would need to do as well is to also index the new
content. I don't want to do useless re-indexing when files just get
renamed. So maybe all we need to do is to save the filesize of the
last-indexed file for a document and then when we encounter a file with
the same message ID and a larger file size, then index it as well?
That would even take care of providing the opportunity to index
additional mailing-list-added content for messages also sent directly
via CC.
The file-size heuristic wouldn't be perfect for these other cases. I
guess we save a list of sha-1 sums for indexed files or so, (assuming
that's cheaper than just re-indexing---before the Xapian Defect 250 fix
I'm sure it is, but after I'm not sure---we maybe should just always
re-index---but I think I have seen the TermGenerator appear in profiles
of indexing runs.)
> * When we get a message-id conflict check for dummy:True
> and replace the document if it is there.
>
> How does this sound?
That sounds fine. It's the same as what I propose above with
"filesize:0" instead of "dummy:true".
> There could be an issue with synthesising too many threads
> and then ending up having to try and put a message in two
> threads? I see there is code for merging threads, would that
> handle this?
It should, yes.
The current logic is that a message can only appear in a single
thread. So if a message has children or parents with distinct thread IDs
then those threads are merged.
I can imagine some strange cross-posting scenario where one could argue
that the merging shouldn't happen, but I'm not sure we want to try to
respect that.
-Carl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20091218/5cda441f/attachment.pgp>
More information about the notmuch
mailing list