[PATCH 00/17] nmbug-status: Python-3-compabitility and general refactoring

Tue Feb 4 11:14:53 PST 2014

On Tue, Feb 04, 2014 at 08:40:18PM +0200, Tomi Ollila wrote:
> On Tue, Feb 04 2014, W. Trevor King wrote:
> >
> >   >>> from __future__ import unicode_literals
> >   >>> import codecs
> >   >>> import locale
> >   >>> import sys
> >   >>> print(locale.getpreferredencoding())  # same as yours
> >   UTF-8
> >   >>> print(sys.getdefaultencoding())  # same as yours
> >   ascii
> >   >>> _ENCODING = locale.getpreferredencoding() or sys.getdefaultencoding()
> >   >>> print(_ENCODING)  # double-check default encodings
> >   UTF-8
> >   >>> byte_stream = sys.stdout  # copied from Page.write
> >   >>> stream = codecs.getwriter(encoding=_ENCODING)(stream=byte_stream)
> >   >>> data = {'from': '\u017b'}  # fake the troublesome data
> >   >>> print(type(data['from']))  # double-check unicode_literals
> >   <type 'unicode'>
> >   >>> string = '  <td>{from}</td>\n'.format(**data)
> >   >>> stream.write(string)
> >     <td>Ż</td>
> >
> > It looks like you'll have the same _ENCODING as I do (UTF-8).  That
> > means your stream should be wrapped in a UTF-8 StreamWriter, so I
> > don't understand why it's converting to ASCII.  Can you run through
> > the above on your troublesome machine and confirm that stream.write()
> > is still raising the exception?  If it doesn't work, can you just
> > paste that whole run in your next email?
> 
> I don't know what to paste, so i paste this:
> 
> $ python
> Python 2.6.6 (r266:84292, Nov 21 2013, 12:39:37) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.

It looks like you left out:

  from __future__ import unicode_literals

Can you try again with that line as the first command?

> >>> data = {'from': '\u017b'}
> >>> print(type(data['from'])) 
> <type 'str'>

which is why your data is a 'str' and not a 'unicode' instance.

> >>> string = '  <td>{from}</td>\n'.format(**data)
> >>> print string
>   <td>\u017b</td>
> 
> and then:
> 
> >>> data = {'from': u'\u017b'}

This works around the lack of unicode_literals with an explicit u''.

> >>> print(type(data['from'])) 
> <type 'unicode'>
> >>> string = '  <td>{from}</td>\n'.format(**data)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u017b' in

However, without unicode_literals or an explicit u'', you're format
string '…{from}' is a str (it should be a 'unicode' instance with
unicode_literals).

> >>> import os
> >>> print os.environ['LANG']
> en_US.UTF-8

That's good anyway ;).  Thanks for digging into this :).

Cheers,
Trevor

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20140204/89b717bd/attachment.pgp>