emacs complains about encoding?

Sun May 20 08:34:51 PDT 2012

On Wed, May 16, 2012 at 3:24 AM, Tomi Ollila <tomi.ollila at iki.fi> wrote:
> Haa, It doesn't matter which is the original encoding of the message;
>
> notmuch reply id:20120515194455.B7AD5100646 at guru.guru-group.fi
>
> where  notmuch show --format=raw ^^^  outputs (among other lines):
>
>  Content-Type: text/plain; charset="iso-8859-1"
>  Content-Transfer-Encoding: quoted-printable
>
> and
>
> notmuch reply id:"878vgsbprq.fsf at nikula.org"
>
> where  notmuch show --format=raw ^^^  outputs (among other lines):
>
>  Content-Type: text/plain; charset="utf-8"
>  Content-Transfer-Encoding: base64
>
> produce correct reply content, both in utf-8.
>
> So it is the emacs side which breaks replies.

It turns out it's actually not the emacs side, but an interaction
between our JSON reply format and emacs.

The JSON reply (and show) code includes part content for all text/*
parts except text/html. Because all JSON is required to be UTF-8, it
handles the encoding itself, puts UTF-8 text in, and omits a
content-charset field from the output. Emacs passes on the
content-charset field to mm-display-part-inline if it's available, but
for text/plain parts it's not, leaving mm-display-part-inline to its
own devices for figuring out what the charset is. It seems
mm-display-part-inline correctly figures out that it's UTF-8, and puts
in the series of ugly \nnn characters because that's what emacs does
with UTF-8 sometimes.

In the original reply stuff (pre-JSON reply format) emacs used the
output of notmuch reply verbatim, so all the charset stuff was handled
in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was
using the JSON reply format, but was inserting the text itself instead
of using mm-display-part-inline, so emacs still wasn't trying to do
any charset manipulation. Using mm-display-part-inline is desirable
because it lets us handle non-text/plain (e.g. text/html) parts
correctly in reply, and makes the display more consistent (since we
use it for show). But, it leads to this problem.

So, there are a couple of solutions I can see:

1) Have the JSON formats include the original content-charset even
though they're actually outputting UTF-8. Of the solutions I tried,
this is the best, even though it doesn't sound like a good thing to
do.

2) Have the JSON formats include content only if it's actually UTF-8.
This means that for non-UTF-8 parts (including ASCII parts), the emacs
interface has to do more work to display the part content, since it
must fetch it from outside first. When I tried this, it worked but
caused the \nnn to show up when viewing messages in emacs. I suspect
this is because it sets a charset for the whole buffer, and can't
accommodate messages with different charsets in the same buffer
properly. Reply works correctly, though.

3) Have the JSON formats include the charset for all parts, but make
it UTF-8 for all parts they include content for (since we're actually
outputting UTF-8). This doesn't seem to fix the problem, even though
it seems like it should.

If no one has a better idea or a strong reason not to, I'll send a
patch for solution (1).

-- Adam