web interface to notmuch
Brian Sniffen
bts at evenmere.org
Tue Oct 31 12:21:40 PDT 2017
> just remove it), but along the way of searching and viewing mail, I've
> encountered quite a few occurrences of failing to UnicodeEncode. An example
> backtrace looks like this:
>
> Traceback (most recent call last):
> File "/usr/lib/python2.7/dist-packages/web/application.py", line 239, in
> process
> return self.handle()
> File "/usr/lib/python2.7/dist-packages/web/application.py", line 230, in
> handle
> return self._delegate(fn, self.fvars, args)
> File "/usr/lib/python2.7/dist-packages/web/application.py", line 420, in
> _delegate
> return handle_class(cls)
> File "/usr/lib/python2.7/dist-packages/web/application.py", line 396, in
> handle_class
> return tocall(*args)
> File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 153,
> in GET
> sprefix=webprefix)
> File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 989,
> in render
> return self.environment.handle_exception(exc_info, True)
> File "/usr/lib/python2.7/dist-packages/jinja2/environment.py", line 754,
> in handle_exception
> reraise(exc_type, exc_value, tb)
> File "templates/show.html", line 1, in top-level template code
> {% extends "base.html" %}
> File "templates/base.html", line 32, in top-level template code
> {% block content %}
> File "templates/show.html", line 12, in block "content"
> {% for part in format_message(m.get_filename(),mid): %}{{ part|safe
> }}{% endfor %}
> File "/b/git/notmuch-brians.git/contrib/notmuch-web/nmweb.py", line 245,
> in format_message_walk
> tags=safe_tags).encode(part.get_content_charset('ascii')))
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in
> position 1141: ordinal not in range(256)
>
> 127.0.0.1:60968 - - [31/Oct/2017 17:00:02] "HTTP/1.1 GET /show/
> 665d8c5c2b024898ae21951c4b8b4f93 at CO2PR05MB747.namprd05.prod.outlook.com" -
> 500 Internal Server Error
>
> I'm no Python expert, but from a quick google it would seem like the cause
> of such an exception is related to not using utf-8.
Neat. So to get there, this has to be a text/html part. It has to have
been decoded, either with the declared content type or with ascii. If a
\u201c (left double quote) showed up, it didn't get decoded as
ascii---and indeed, it looks like the content-type specifies latin-1.
But now when we try to encode back, using the same latin-1, it fails?
That's really neat.
> Brian - do you think something needs modifying in nmweb.py to cater for
> this type of thing, or is this somehow related my own mailstore (not sure
> why that would be as my messages haven't been modified).
Lots of mail has busted encoding. I've done some defensive work against
that---look at decodeAnyway and shed a tear for purity---but clearly not
enough. Can you send me a message that causes the problem?
In the mean time, I think like 245 ought to be, appropriately indented:
tags=safe_tags).encode(part.get_content_charset('ascii'),
'xmlcharrefreplace'))
Thanks for the report---investigating it showed me that the search box
doesn't tolerate that character either.
-Brian
More information about the notmuch
mailing list