[PATCH v3 3/5] Add indexing for the mimetype term
Todd
todd at electricoding.com
Sat Jan 17 08:41:10 PST 2015
>>>>> "DB" == David Bremner <david at tethera.net> writes:
DB> Todd <todd at electricoding.com> writes:
>> Adds the indexing and removes the broken test flag
>> ---
>> lib/database.cc | 1 +
>> lib/index.cc | 10 ++++++++++
>> test/T190-multipart.sh | 4 ----
>> 3 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/database.cc b/lib/database.cc
>> index 0d2c417..3974e2e 100644
>> --- a/lib/database.cc
>> +++ b/lib/database.cc
>> @@ -254,6 +254,7 @@ static prefix_t PROBABILISTIC_PREFIX[]= {
>> { "from", "XFROM" },
>> { "to", "XTO" },
>> { "attachment", "XATTACHMENT" },
>> + { "mimetype", "XMIMETYPE"},
>> { "subject", "XSUBJECT"},
>> };
DB> I think the commit message should articulate why we are indexing this as
DB> a probabilistic prefix, rather than as a boolean prefix. In particular,
DB> this gives people a last chance to complain.
DB> The reference I know is http://xapian.org/docs/queryparser.html
DB> If I understand correctly (it would be great if you could test this
DB> Todd) , with a probabilistic prefix,
DB> mimetime:pdf
DB> will match
DB> application/pdf
DB> image/pdf
DB> application/x-pdf
DB> application/x-ext-pdf
DB> but not
DB> application/x-bzpdf
DB> application/x-gzpdf
DB> application/x-xzpdf
I just tested, and it does work this way with your examples. I
*believe* from reading the docs, that xapian is treating the full
MIME-type queries as phrase searches anyway due to the embedded
slashes.
From http://xapian.org/docs/queryparser.html:
A phrase surrounded with double quotes ("") matches documents
containing that exact phrase. Hyphenated words are also treated
as phrases, as are cases such as filenames and email addresses
(e.g. /etc/passwd or president at whitehouse.gov).
I think that we'll get good behavior from the types of queries that
will typically be performed due to this automatic phrasing.
DB> On the whole, this is probably more beneficial than bad. The downside
DB> of probabilistic prefixes/fields is that they are not "anchored", so
DB> there is no easy way to distinguish
DB> application/pdf
DB> from
DB> pdf
DB> application/x-pdf
DB> I guess in a perfect world this would also be explained in
DB> notmuch-search-terms(7), but that's pretty much orthogonal to this
DB> series.
If separate messages with application/pdf and application/x-pdf are
indexed, then:
mimetype:application/x-pdf finds only the application/x-pdf
mimetype:application/pdf finds only the application/pdf
mimetype:pdf finds both of the messages
I am fairly sure that this behaviour is a result of the automatic
phrasing mentioned above.
- Todd
DB> d
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 818 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20150117/d46cb7a9/attachment.pgp>
More information about the notmuch
mailing list