header continuation issue in notmuch frontend/alot/pythons email module

Justus Winter 4winter at informatik.uni-hamburg.de
Mon Jun 24 01:57:10 PDT 2013


Quoting Austin Clements (2013-06-23 18:59:39)
> Quoth Justus Winter on Jun 23 at  3:11 pm:
> > Hi,
> > 
> > I recently had a problem replying to a mail written by Thomas Schwinge
> > using an oldish notmuch. Not sure if it has been fixed in more recent
> > versions, but I think notmuch could improve uppon its header
> > generation (see below). Problematic part of the mail:
> > 
> > ~~~ snip ~~~
> > [...]
> > To: someone at example.org, "line
> >  break" <linebreak at example.org>, someoneelse at example.org
> > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 (i486-pc-linux-gnu)
> > [...]
> > ~~~ snap ~~~
> > 
> > http://tools.ietf.org/html/rfc2822#section-2.2.3 says:
> > 
> >    Note: Though structured field bodies are defined in such a way that
> >    folding can take place between many of the lexical tokens (and even
> >    within some of the lexical tokens), folding SHOULD be limited to
> >    placing the CRLF at higher-level syntactic breaks.  For instance, if
> >    a field body is defined as comma-separated values, it is recommended
> >    that folding occur after the comma separating the structured items in
> >    preference to other places where the field could be folded, even if
> >    it is allowed elsewhere.
> > 
> > So notmuch "rfc-SHOULD" place the newlines after the comma.
> > 
> > The rfc goes on:
> > 
> >    The process of moving from this folded multiple-line representation
> >    of a header field to its single line representation is called
> >    "unfolding". Unfolding is accomplished by simply removing any CRLF
> >    that is immediately followed by WSP.  Each header field should be
> >    treated in its unfolded form for further syntactic and semantic
> >    evaluation.
> > 
> > My interpretation is that unfolding simply removes any linebreaks
> > first, so the value does not contain any newlines. But pythons email
> > module discriminates quoted and unquoted parts of the value:
> > 
> > ~~~ snip ~~~
> > from __future__ import print_function
> > import email
> > from email.utils import getaddresses
> > 
> > m = email.message_from_string('''To: "line
> >  break" <linebreak at example.org>, line
> >  break <linebreak at example.org>''')
> > print("m['To'] = ", m['To'])
> > print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
> > ~~~ snap ~~~
> > 
> > % python3 test.py
> > m['To'] =  "line
> >  break" <linebreak at example.org>, line
> >  break <linebreak at example.org>
> > getaddresses(m.get_all('To')) =  [('line\n break', 'linebreak at example.org'), ('line break', 'linebreak at example.org')]
> > 
> > I believe that is what's preventing me from replying to the message
> > using alot without sanitizing the To header first. Not really sure who
> > is wrong or right here... any thoughts?
> 
> There are at least two bugs here.  Regardless of what we RFC-should
> do, that folding *is* permitted by RFC2822, since quoted
> strings can contain folding whitespace:
> 
>   http://tools.ietf.org/html/rfc2822#section-3.2.5
> 
> For completeness, the full derivation for this "To" header is:
> 
> to              =       "To:" address-list CRLF
> address-list    =       (address *("," address)) / obs-addr-list
> address         =       mailbox / group
> mailbox         =       name-addr / addr-spec
> name-addr       =       [display-name] angle-addr
> display-name    =       phrase
> phrase          =       1*word / obs-phrase
> word            =       atom / quoted-string
> quoted-string   =       [CFWS]
>                         DQUOTE *([FWS] qcontent) [FWS] DQUOTE
>                         [CFWS]
> 
> Do you happen to know how the strangely folded "to" header was
> produced for this message?

No, but Thomas might. Thomas, the problematic message is
id:877ghpqckb.fsf at kepler.schwinge.homeip.net

>  In notmuch-emacs, a user can put whatever
> they want in a message-mode buffer's headers and mm will dutifully
> pass it on to their MTA.  We could validate it, but that's a slippery
> slope and I would hope that the MTA itself is validating it (and
> probably more thoroughly than we could).
> 
> That said, the first bug here is in Python.  As I mentioned above,
> foldable whitespace is allowed in quoted strings.  In fact, though the
> standard is rather long-winded about whitespace, if you dig into the
> grammar, you'll find that *all whitespace can be folded* (except in
> the obsolete grammar, which allowed whitespace between the header name
> and the colon, which obviously can't be folded).  I'm not sure what
> Python is doing, but I bet it's going to a lot of effort to
> mis-implement something very simple.

Yes, I'm glad you came to the same conclusion.

> There also appears to be a bug in the notmuch CLI's reply command
> where it omits addresses that were folded in the original message.  I
> don't know if alot uses the CLI's reply command, so this may or may
> not be related to your specific issue.  I haven't dug into this yet,
> other than to confirm that it's the CLI's fault and not
> notmuch-emacs's.

No, alot does not use notmuchs reply command.

Thanks,
Justus


More information about the notmuch mailing list