header continuation issue in notmuch frontend/alot/pythons email module
Austin Clements
amdragon at MIT.EDU
Sun Jun 23 09:59:39 PDT 2013
Quoth Justus Winter on Jun 23 at 3:11 pm:
> Hi,
>
> I recently had a problem replying to a mail written by Thomas Schwinge
> using an oldish notmuch. Not sure if it has been fixed in more recent
> versions, but I think notmuch could improve uppon its header
> generation (see below). Problematic part of the mail:
>
> ~~~ snip ~~~
> [...]
> To: someone at example.org, "line
> break" <linebreak at example.org>, someoneelse at example.org
> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu)
> [...]
> ~~~ snap ~~~
>
> http://tools.ietf.org/html/rfc2822#section-2.2.3 says:
>
> Note: Though structured field bodies are defined in such a way that
> folding can take place between many of the lexical tokens (and even
> within some of the lexical tokens), folding SHOULD be limited to
> placing the CRLF at higher-level syntactic breaks. For instance, if
> a field body is defined as comma-separated values, it is recommended
> that folding occur after the comma separating the structured items in
> preference to other places where the field could be folded, even if
> it is allowed elsewhere.
>
> So notmuch "rfc-SHOULD" place the newlines after the comma.
>
> The rfc goes on:
>
> The process of moving from this folded multiple-line representation
> of a header field to its single line representation is called
> "unfolding". Unfolding is accomplished by simply removing any CRLF
> that is immediately followed by WSP. Each header field should be
> treated in its unfolded form for further syntactic and semantic
> evaluation.
>
> My interpretation is that unfolding simply removes any linebreaks
> first, so the value does not contain any newlines. But pythons email
> module discriminates quoted and unquoted parts of the value:
>
> ~~~ snip ~~~
> from __future__ import print_function
> import email
> from email.utils import getaddresses
>
> m = email.message_from_string('''To: "line
> break" <linebreak at example.org>, line
> break <linebreak at example.org>''')
> print("m['To'] = ", m['To'])
> print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
> ~~~ snap ~~~
>
> % python3 test.py
> m['To'] = "line
> break" <linebreak at example.org>, line
> break <linebreak at example.org>
> getaddresses(m.get_all('To')) = [('line\n break', 'linebreak at example.org'), ('line break', 'linebreak at example.org')]
>
> I believe that is what's preventing me from replying to the message
> using alot without sanitizing the To header first. Not really sure who
> is wrong or right here... any thoughts?
There are at least two bugs here. Regardless of what we RFC-should
do, that folding *is* permitted by RFC2822, since quoted
strings can contain folding whitespace:
http://tools.ietf.org/html/rfc2822#section-3.2.5
For completeness, the full derivation for this "To" header is:
to = "To:" address-list CRLF
address-list = (address *("," address)) / obs-addr-list
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
display-name = phrase
phrase = 1*word / obs-phrase
word = atom / quoted-string
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
Do you happen to know how the strangely folded "to" header was
produced for this message? In notmuch-emacs, a user can put whatever
they want in a message-mode buffer's headers and mm will dutifully
pass it on to their MTA. We could validate it, but that's a slippery
slope and I would hope that the MTA itself is validating it (and
probably more thoroughly than we could).
That said, the first bug here is in Python. As I mentioned above,
foldable whitespace is allowed in quoted strings. In fact, though the
standard is rather long-winded about whitespace, if you dig into the
grammar, you'll find that *all whitespace can be folded* (except in
the obsolete grammar, which allowed whitespace between the header name
and the colon, which obviously can't be folded). I'm not sure what
Python is doing, but I bet it's going to a lot of effort to
mis-implement something very simple.
There also appears to be a bug in the notmuch CLI's reply command
where it omits addresses that were folded in the original message. I
don't know if alot uses the CLI's reply command, so this may or may
not be related to your specific issue. I haven't dug into this yet,
other than to confirm that it's the CLI's fault and not
notmuch-emacs's.
> Justus
More information about the notmuch
mailing list