emacs complains about encoding?

Michal Sojka sojkam1 at fel.cvut.cz
Tue May 22 05:53:26 PDT 2012


Hello Adam,

Adam Wolfe Gordon <awg+notmuch at xvx.ca> writes:
> It turns out it's actually not the emacs side, but an interaction
> between our JSON reply format and emacs.
>
> The JSON reply (and show) code includes part content for all text/*
> parts except text/html. Because all JSON is required to be UTF-8, it
> handles the encoding itself, puts UTF-8 text in, and omits a
> content-charset field from the output. Emacs passes on the
> content-charset field to mm-display-part-inline if it's available, but
> for text/plain parts it's not, leaving mm-display-part-inline to its
> own devices for figuring out what the charset is. It seems
> mm-display-part-inline correctly figures out that it's UTF-8, and puts
> in the series of ugly \nnn characters because that's what emacs does
> with UTF-8 sometimes.
>
> In the original reply stuff (pre-JSON reply format) emacs used the
> output of notmuch reply verbatim, so all the charset stuff was handled
> in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was
> using the JSON reply format, but was inserting the text itself instead
> of using mm-display-part-inline, so emacs still wasn't trying to do
> any charset manipulation. Using mm-display-part-inline is desirable
> because it lets us handle non-text/plain (e.g. text/html) parts
> correctly in reply, and makes the display more consistent (since we
> use it for show). But, it leads to this problem.
>
> So, there are a couple of solutions I can see:
>
> 1) Have the JSON formats include the original content-charset even
> though they're actually outputting UTF-8. Of the solutions I tried,
> this is the best, even though it doesn't sound like a good thing to
> do.
>
> 2) Have the JSON formats include content only if it's actually UTF-8.
> This means that for non-UTF-8 parts (including ASCII parts), the emacs
> interface has to do more work to display the part content, since it
> must fetch it from outside first. When I tried this, it worked but
> caused the \nnn to show up when viewing messages in emacs. I suspect
> this is because it sets a charset for the whole buffer, and can't
> accommodate messages with different charsets in the same buffer
> properly. Reply works correctly, though.
>
> 3) Have the JSON formats include the charset for all parts, but make
> it UTF-8 for all parts they include content for (since we're actually
> outputting UTF-8). This doesn't seem to fix the problem, even though
> it seems like it should.
>
> If no one has a better idea or a strong reason not to, I'll send a
> patch for solution (1).

Thank you very much for your analysis. It encouraged me to dig into the
problem and I've found another solution, which might be better than
those you suggested.

I traced what Emacs does with the text inside
notmuch-mm-display-part-inline and the wrong charset conversion happens
deeply in elisp code in mm-with-part called by mm-get-part, which is in
turn called by mm-inline-text. There is a way to make mm-inline-text not
to call mm-get-part, which is to set the charset to 'gnus-decoded. This
sounds like something that applies to our situation, where the part is
already decoded.

The following patch (apply it with git am -c) solves the problem for me.
However, I'm not sure it is a universal solution. It sets the charset
only if it is not defined in notmuch json output and I'm not sure that
this is correct. text/html parts seem to have charset defined, but as
you wrote that json is always utf-8, so it might be that we need
'gnus-decoded always, independently of the json output. What do you
think?

-Michal

----8<-------
diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 7fa441a..8070f05 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -244,7 +244,7 @@ the given type."
 current buffer, if possible."
   (let ((display-buffer (current-buffer)))
     (with-temp-buffer
-      (let* ((charset (plist-get part :content-charset))
+      (let* ((charset (or (plist-get part :content-charset) 'gnus-decoded))
             (handle (mm-make-handle (current-buffer) `(,content-type (charset . ,charset)))))
        ;; If the user wants the part inlined, insert the content and
        ;; test whether we are able to inline it (which includes both


More information about the notmuch mailing list