UnicodeDecodeError with python API

W. Trevor King wking at tremily.us
Sun Mar 29 22:26:51 PDT 2015


On Sun, Mar 29, 2015 at 07:10:53PM -0400, Sebastian Fischmeister wrote:
> > My first guess is that the file's encoding doesn't match your
> > locale.  Do you have a non-ASCII locale set?  You can check with:
> 
> It seems to be more tricky than I thought. I didn't have a locale set.
> 
> When I set one, I can parse some emails with this:
> 
> export LANG=en_US.latin-1
> 
> Others with this:
> 
> export LANG=en_US.UTF-8
> 
> Others fail with either of the two.

Hmm, that's surprising.  In hindsight, the locale should only be
affecting the *output* (e.g., a non-Unicode locale might cause a
UnicodeEncodeError).  However, you're getting your errors on input.
I'd expect the files to be loaded and parsed as byte-streams, but
maybe there's a bug in Python's email parser.  It wouldn't be the
first time it's had trouble with bytes-vs-Unicode (see these old bugs
with similar tracebacks from the initial transition to 3.0 [1,2], or
search “unicode email” on http://bugs.python.org/).  I'd try to
reproduce this failure by calling email.message_from_file(…) directly
(getting notmuch out of the loop), and then file a bug against Python
once you have a pure-Python reproduction.

Cheers,
Trevor

[1]: http://bugs.python.org/issue1086
[2]: http://bugs.python.org/issue1258#msg56470

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20150329/370a9519/attachment.pgp>


More information about the notmuch mailing list