locales and notmuch

David Bremner david at tethera.net
Sat Feb 23 03:43:58 PST 2019


Matt Armstrong <marmstrong at google.com> writes:

>
> Notmuch should probably adopt a coherent strategy with respect to
> character set encodings, rather than do something ad-hoc for the
> feature.  Most systems I have worked with normalize to UTF-8 at the
> edges and do all work using that encoding.
>

You're probably correct. On the other hand, lack of locale handling is not
something that people actually complain about very much. So if we do
decide to "Do the right thing", then I'd probably just continue ignoring
the problem, rather than block working on things that do annoy people.

> It is an interesting question: what encoding does .notmuch-config use?
> UTF-8?  User's choice?

It's loaded by g_key_file_load_from_data; I suspect that does no conversion.

> Similarly, what is the encoding of notmuch's
> command line args?

There is no conversion done.

In both these cases it probably works mostly OK for people (at least
nobody complained) because user values are treated as opaque null
terminated byte sequences.

> I was just reading https://xapian.org/features and Xapian seems to store
> text in UTF-8.  If this is the case, where is the code that does the
> charset conversions between the email messages and UTF-8?

I'd have to double check the code to be sure, but I suspect this is done
by GMime when parsing the files.

> How about
> between the command line args to UTF-8?

AFAIR, there is no conversion, and search terms are passed straight to
Xapian.

This probably doesn't work well for people with non-UTF-8 locales.


More information about the notmuch mailing list