UnicodeDecodeError with python API
W. Trevor King
wking at tremily.us
Sun Mar 29 22:26:51 PDT 2015
On Sun, Mar 29, 2015 at 07:10:53PM -0400, Sebastian Fischmeister wrote:
> > My first guess is that the file's encoding doesn't match your
> > locale. Do you have a non-ASCII locale set? You can check with:
>
> It seems to be more tricky than I thought. I didn't have a locale set.
>
> When I set one, I can parse some emails with this:
>
> export LANG=en_US.latin-1
>
> Others with this:
>
> export LANG=en_US.UTF-8
>
> Others fail with either of the two.
Hmm, that's surprising. In hindsight, the locale should only be
affecting the *output* (e.g., a non-Unicode locale might cause a
UnicodeEncodeError). However, you're getting your errors on input.
I'd expect the files to be loaded and parsed as byte-streams, but
maybe there's a bug in Python's email parser. It wouldn't be the
first time it's had trouble with bytes-vs-Unicode (see these old bugs
with similar tracebacks from the initial transition to 3.0 [1,2], or
search “unicode email” on http://bugs.python.org/). I'd try to
reproduce this failure by calling email.message_from_file(…) directly
(getting notmuch out of the loop), and then file a bug against Python
once you have a pure-Python reproduction.
Cheers,
Trevor
[1]: http://bugs.python.org/issue1086
[2]: http://bugs.python.org/issue1258#msg56470
--
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20150329/370a9519/attachment.pgp>
More information about the notmuch
mailing list