[RFC patch 2/2] lib: index message files with duplicate message-ids

Jani Nikula jani at nikula.org
Wed Mar 22 10:29:30 PDT 2017

On Thu, 16 Mar 2017, David Bremner <david at tethera.net> wrote:
> Daniel Kahn Gillmor <dkg at fifthhorseman.net> writes:
>> On Wed 2017-03-15 21:57:28 -0400, David Bremner wrote:
>>> The corresponding xapian document just gets more terms added to it,
>>> but this doesn't seem to break anything.
>> this is an interesting suggestion.  thanks for proposing it!
>> A couple questions:
>>  0) what happens when one of the files gets deleted from the message
>>     store? do the terms it contributes get removed from the index?
> That's a good guestion, and an issue I hadn't thought about.
> Currently there's no way to do this short of deleting all the terms (for
> all the files (excepting tags and properties, presumably) and
> reindexing. This will require some more thought, I think.

We already see some of this issue. First file gets indexed, second file
gets added, first file gets removed.

There's also the related problem of reindexing potentially changing the
file being indexed and returned. The first time around the indexing
order is likely the order the message files were received in; on
reindexing it's the order the message files are encountered in the file
system. I presume the patch at hand keeps the search terms that find the
messages the same regardless of the indexing order.


