Handling mislabeled emails encoded with Windows-1252

Sebastian Poeplau sebastian.poeplau at eurecom.fr
Tue Jul 24 06:55:54 PDT 2018


Hi again,

>> Everyone's mail situation is unique, but I haven't noticed this
>> problem. Do you have a mechanical (e.g. scripted) way of detecting such
>> mails? I suppose it could just look for characters in the range 0x80 to
>> 0x95 in allegedly ISO_8859-1 messages. A census of the situation in my
>> own mail would help me think about this problem, I think.
>
> Yes, I guess that should be a good enough heuristic for detecting
> affected mail. I'll try to come up with a simple script and post it
> here.

Attached is a Python script that checks individual message files and
prints their name if it finds them to contain mislabeled Windows-1252
text. The heuristic seems to work well on my mail - let me know if you
encounter any issues!

Cheers,
Sebastian


-------------- next part --------------
A non-text attachment was scrubbed...
Name: find_mislabeled_cp1252.py
Type: application/octet-stream
Size: 840 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20180724/b9d3799e/attachment.obj>


More information about the notmuch mailing list