Encodings

Sebastian Spaeth Sebastian at SSpaeth.de
Mon Jul 11 07:04:17 PDT 2011


Hi all,
after I was notified about how notmuch's python bindings perform
differently depending on whether we hand it (byte-based) ASCII strings
or unicode, I tried to disentangle what encodings to expect and send it
to. The answer is that things are very implicit. notmuch.h speaks of
strings but never mentions encodings, xapian docs don't mention
encodings but ojwb confirmed that it expects utf-8.

So, can be document what encoding we are expected to pass in the various
APIs and where we can guarantee to actually return UTF-8 encoded
strings? For some of the stuff we read directly from the files, eg
arbitrary headers, we can probably be least sure, but are e.g. the
returned tags always utf-8?

I would love to make the python bindings use unicode() instances in
cases where we can be sure to actually receive utf-8 encoded strings.

Encodings make my brain hurt. Unfortunately one cannot simply ignore
them.

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110711/0c5a127e/attachment.pgp>


More information about the notmuch mailing list