[PATCH] nmbug: Allow Unicode tags and IDs in Python 2
W. Trevor King
wking at tremily.us
Sun Feb 14 21:30:11 PST 2016
Avoid a UnicodeWarning and broken pipe on 'nmbug commit' in Python 2
when a tag or message ID contains non-ASCII characters [1].
There are a number of Python bugs associated with this behavior
[2,3,4,5,6]. There's also some useful background in [8]. [3] lead to
the currently working Python 3 implementation, which encodes to UTF-8
by default and has 'encoding' and 'errors' arguments [7]. This commit
follows that approach in a way that's compatible with both Python 2
and Python 3. Coercing to UTF-8 (regardless of locale) gives us
consistent tag IDs for sharing between users.
The 'isnumeric' check identifies Unicode instances in both Python 2
[9] and Python 3 [10].
[1]: id:87twlbv5vj.fsf at zancas.localnet
http://thread.gmane.org/gmane.mail.notmuch.general/21855/focus=21862
Subject: Re: problems with nmbug and empty prefix (UnicodeWarning and broken pipe)
Date: Sun, 14 Feb 2016 08:22:24 -0400
[2]: http://bugs.python.org/issue2637
[3]: http://bugs.python.org/issue3300
[4]: http://bugs.python.org/issue22231
[5]: http://bugs.python.org/issue23885
[6]: http://bugs.python.org/issue1712522
[7]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.quote
[8]: https://mail.python.org/pipermail/python-dev/2006-July/067335.html
[9]: https://docs.python.org/2/library/stdtypes.html#unicode.isnumeric
[10]: https://docs.python.org/3/library/stdtypes.html#str.isnumeric
---
I haven't checked the other commands for issues with Unicode IDs or
tags. It's possible that in addition to this explicit encoding to
UTF-8, we'll also want explicit decoding from UTF-8 when reading from
Git trees (for 'nmbug checkout' and 'nmbug status').
Cheers,
Trevor
devel/nmbug/nmbug | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/devel/nmbug/nmbug b/devel/nmbug/nmbug
index 81f582c..284d374 100755
--- a/devel/nmbug/nmbug
+++ b/devel/nmbug/nmbug
@@ -1,6 +1,6 @@
#!/usr/bin/env python
#
-# Copyright (c) 2011-2014 David Bremner <david at tethera.net>
+# Copyright (c) 2011-2016 David Bremner <david at tethera.net>
# W. Trevor King <wking at tremily.us>
#
# This program is free software: you can redistribute it and/or modify
@@ -95,7 +95,7 @@ except AttributeError: # Python < 3.2
_tempfile.TemporaryDirectory = _TemporaryDirectory
-def _hex_quote(string, safe='+@=:,'):
+def _hex_quote(string, safe='+@=:,', encoding='utf-8', errors='strict'):
"""
quote('abc def') -> 'abc%20def'.
@@ -103,6 +103,15 @@ def _hex_quote(string, safe='+@=:,'):
addition to letters, digits, and '_.-') and lowercase hex digits
(e.g. '%3a' instead of '%3A').
"""
+ if hasattr(string, 'isnumeric'):
+ string = string.encode(encoding, errors)
+ if hasattr(safe, 'isnumeric'):
+ safe_bytes = safe.encode(encoding, errors)
+ if len(safe_bytes) != len(safe):
+ raise ValueError(
+ 'some safe characters are encoded as multiple bytes '
+ '({!r} -> {!r})'.format(safe, safe_bytes))
+ safe = safe_bytes
uppercase_escapes = _quote(string, safe)
return _HEX_ESCAPE_REGEX.sub(
lambda match: match.group(0).lower(),
--
2.1.0.60.g85f0837
More information about the notmuch
mailing list