[PATCH] nmbug: Allow Unicode tags and IDs in Python 2

W. Trevor King wking at tremily.us
Sun Feb 14 21:30:11 PST 2016


Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2
when a tag or message ID contains non-ASCII characters [1].

There are a number of Python bugs associated with this behavior
[2,3,4,5,6].  There's also some useful background in [8].  [3] lead to
the currently working Python 3 implementation, which encodes to UTF-8
by default and has 'encoding' and 'errors' arguments [7].  This commit
follows that approach in a way that's compatible with both Python 2
and Python 3.  Coercing to UTF-8 (regardless of locale) gives us
consistent tag IDs for sharing between users.

The 'isnumeric' check identifies Unicode instances in both Python 2
[9] and Python 3 [10].

[1]: id:87twlbv5vj.fsf at zancas.localnet
     http://thread.gmane.org/gmane.mail.notmuch.general/21855/focus=21862
     Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
     Date: Sun, 14 Feb 2016 08:22:24 -0400
[2]: http://bugs.python.org/issue2637
[3]: http://bugs.python.org/issue3300
[4]: http://bugs.python.org/issue22231
[5]: http://bugs.python.org/issue23885
[6]: http://bugs.python.org/issue1712522
[7]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
[8]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html
[9]: https://docs.python.org/2/library/stdtypes.html#unicode.isnumeric
[10]: https://docs.python.org/3/library/stdtypes.html#str.isnumeric
---
I haven't checked the other commands for issues with Unicode IDs or
tags.  It's possible that in addition to this explicit encoding to
UTF-8, we'll also want explicit decoding from UTF-8 when reading from
Git trees (for 'nmbug checkout' and 'nmbug status').

Cheers,
Trevor

 devel/nmbug/nmbug | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug
index 81f582c..284d374 100755
--- a/devel/nmbug/nmbug
+++ b/devel/nmbug/nmbug
@@ -1,6 +1,6 @@
 #!/usr/bin/env python
 #
-# Copyright (c) 2011-2014 David Bremner <david at tethera.net>
+# Copyright (c) 2011-2016 David Bremner <david at tethera.net>
 #                         W. Trevor King <wking at tremily.us>
 #
 # This program is free software: you can redistribute it and/or modify
@@ -95,7 +95,7 @@ except AttributeError:  # Python < 3.2
     _tempfile.TemporaryDirectory = _TemporaryDirectory
 
 
-def _hex_quote(string, safe='+@=:,'):
+def _hex_quote(string, safe='+@=:,', encoding='utf-8', errors='strict'):
     """
     quote('abc def') -> 'abc%20def'.
 
@@ -103,6 +103,15 @@ def _hex_quote(string, safe='+@=:,'):
     addition to letters, digits, and '_.-') and lowercase hex digits
     (e.g. '%3a' instead of '%3A').
     """
+    if hasattr(string, 'isnumeric'):
+        string = string.encode(encoding, errors)
+    if hasattr(safe, 'isnumeric'):
+        safe_bytes = safe.encode(encoding, errors)
+        if len(safe_bytes) != len(safe):
+            raise ValueError(
+                'some safe characters are encoded as multiple bytes '
+                '({!r} -> {!r})'.format(safe, safe_bytes))
+        safe = safe_bytes
     uppercase_escapes = _quote(string, safe)
     return _HEX_ESCAPE_REGEX.sub(
         lambda match: match.group(0).lower(),
-- 
2.1.0.60.g85f0837



More information about the notmuch mailing list