Tabulation in multiline headers
Jani Nikula
jani at nikula.org
Sat Oct 18 02:11:56 PDT 2014
On Sat, 18 Oct 2014, Sergei Shilovsky <sshilovsky at gmail.com> wrote:
>> Hi, Sergei. I'm not clear on where exactly you are seeing a problem
>> with this tab in the subject line. Is it showing up somewhere you think
>> it shouldn't?
>
> It is shown in e.g. `notmuch show` as well as
> 'notmuch_message_get_header(m, "subject")`
>
>> I'm not sure libnotmuch should be doing any scrubbing of the message
>> contents. The emacs UI does seem to replace the tab with a space,
>> though. Maybe other MUAs should be doing the same?
>
> My point is that this tabulation character does not relate to the
> contents of the header (this might be arguable though) and libnotmuch
> should return the contents, not its representation on file system.
This is folding and unfolding of long header fields in action, described
in [1]. In short, folding happens by inserting CRLF before any WSP, and
unfolding happens by removing any CRLF immediately followed by WSP. The
WSP is preserved unchanged through folding and unfolding. The TAB is not
part of the multiple line representation, it's part of the unfolded
content.
If my memory serves me right, many problems lead back to an
interpretation of [2] that you could insert extra WSP while folding. Due
to this interpretation, many agents replace the WSP following a CRLF
with a single space while unfolding. And presumably because of this,
buggy folding in a Python email package that replaces WSP by a TAB while
folding went unnoticed. This problem, in turn, has been literally spread
wide by Mailman 2 through its use of said email package. In practice it
follows that a perfectly good message will have folding WSP replaced by
TAB when it gets transmitted through Mailman 2. Again, this is all from
memory, [citation needed] etc.
Notmuch is not free of a history of its own when it comes to header
unfolding. For historical reasons, we used two header parsers until
recently. One from gmime, and one of our own. After all of the above, it
shouldn't surprise the reader that the parsers treated folding WSP
differently! Our own parser replaced folding WSP with a single space,
while gmime respects the RFC. Starting from 0.18 we only use gmime to
parse headers, which means we're at least consistent, but, by the GIGO
principle, we may see more folding TABs.
I do not think we should workaround header folding problems in the lib,
and I'm not sure about the cli either. We should consider replacing TABs
with spaces in notmuch-emacs though (I personally use a
notmuch-show-markup-headers-hook that does that).
HTH,
Jani.
[1] https://tools.ietf.org/html/rfc5322#section-2.2.3
[2] https://tools.ietf.org/html/rfc822#section-3.1
More information about the notmuch
mailing list