How does notmuch detect the presence of attachments?
Daniel Kahn Gillmor
dkg at fifthhorseman.net
Thu Aug 25 07:21:21 PDT 2011
On 08/03/2011 06:01 AM, moabi2000 wrote:
> 1) How does notmuch detect the presence of attachments? I have some
> messages that have attachments (which I can see and open when reading
> the message), but for which the 'attachment' flag is not set (and
> therefore don't show up in a search like "from:myfriend AND
> attachment:pdf"). How can I try to work out what is going on?
According to lib/index.cc (around line 366 in the current version), the
tag "attachment" is added to an e-mail only if one of the MIME parts of
the message has an explicit "Content-Disposition: attachment" MIME
subheader.
So some mail clients may be attaching files with "Content-Disposition:
inline" (i do this sometimes when attaching text/* files) or without a
Content-Disposition: header on the MIME part at all.
Perhaps notmuch could keep a (configurable?) list of Content-Types that
should be tagged with "attachment" no matter what Content-Disposition is
used? I could imagine an initial list like:
application/pdf
application/vnd.oasis.opendocument.text
application/vnd.oasis.opendocument.spreadsheet
Or maybe just any mime part with "application" as the major Content
type? That would be a relatively easy (though non-general) heuristic to
implement. Want to take a crack at it?
> 2) Is there an option for notmuch to also index the text of
> attachments (like recoll does, which also uses xapian)? People tend to
> save attachments with really useless filenames (report2.pdf...), what
> I'd like to be able to do is a search like "from:mycolleague AND
> attachment:pdf AND attachmentcontains:ourproject"
This is another great suggestion for improvement, i think. There are
even comments in the code (around the same part referenced above) that says:
/* XXX: Would be nice to call out to something here to parse
* the attachment into text and then index that. */
A generic shim here, with a configurable index that associates
Content-Types with safe convert-to-text functions would be quite nice.
This would probably be a new section in ~/.notmuch-config,
[textconverters], where the keys would be a specific Content-Type and
the values would be system calls that take the file on stdin and produce
plain text to index on stdout, like so:
[textconverters]
application/pdf=pdf2txt /dev/stdin
Starting with an initially empty set of textconverters seems reasonable
and safe to me, and people could set up their own if they're interested.
You'd need to re-index your message store after modifying the config,
though, if you wanted to have pre-existing messages get indexed this
way. Is there a way to tell notmuch to re-index a particular message?
The above proposal isn't implemented at all, i'm just throwing it out
for consideration.
--dkg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1030 bytes
Desc: OpenPGP digital signature
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110825/dc099cd9/attachment.pgp>
More information about the notmuch
mailing list