utf-8 in author field

Michal Sojka sojkam1 at fel.cvut.cz
Mon May 17 00:56:27 PDT 2010


On Fri, 14 May 2010, Igor Shenderovich wrote:
> Hello all,
> 
> I'm using the latest version of notmuch (cloned from git on May 13), but I
> can't handle with utf-8 symbols in the authors field. For example, I have a
> letter with the field
> 
> "authors":
> "=?UTF-8?B?Z3JpZmZvbiAtINCa0L7QvNC80LXQvdGC0LDRgNC40Lkg0LIg0JbQlg==?=",
> 
> (got it from usual emacs interface).
> 
> However, the body of this letter is pretty readable (it also contains some
> utf-8 characters).
> 
> What should one do to see the true list of authors?

Hi,

I encounter the same when headers are not encoded properly according to
RFC 2047. I commonly see the violation of section 5, paragraph (3),
sentence "An 'encoded-word' MUST NOT appear within a 'quoted-string'".
That is when the encoded word is enclosed in double quotes. I guess, the
"problem" is not only notmuch related, but all users of gmime library
must be affected.

I use the following patch for notmuch to sanitize headers from a popular
mailing list server in Czech republic:

Cheers,
Michal



From: Michal Sojka <sojkam1 at fel.cvut.cz>
Subject: Fix broken headers from pandora.cz


---
 lib/message-file.c |   34 ++++++++++++++++++++++++++++++++++
 1 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/lib/message-file.c b/lib/message-file.c
index 7722832..abfedc1 100644
--- a/lib/message-file.c
+++ b/lib/message-file.c
@@ -42,6 +42,7 @@ struct _notmuch_message_file {
     int broken_headers;
     int good_headers;
     size_t header_size; /* Length of full message header in bytes. */
+    notmuch_bool_t pandora_cz_quirk;
 
     /* Parsing state */
     char *line;
@@ -324,7 +325,40 @@ notmuch_message_file_get_header (notmuch_message_file_t *message,
 	else
 	    match = (strcasecmp (header, header_desired) == 0);
 
+	if (strstr(message->value.str, "=40pandora=2Ecz=29") ||
+	    strstr(message->value.str, "@pandora.cz") ||
+	    message->pandora_cz_quirk)
+	{
+	    char *quote = message->value.str;
+	    message->pandora_cz_quirk = TRUE;
+	    if (*quote == '"') {
+		int len = strlen(quote);
+		bcopy(quote+1, quote, len);
+		quote = strchr(quote, '"');
+		if (quote) {
+		    len = strlen(quote);
+		    bcopy(quote+1, quote, len);
+		}
+	    }
+	}
+
 	decoded_value = g_mime_utils_header_decode_text (message->value.str);
+
+	if (message->pandora_cz_quirk &&
+	    strcasecmp (header, "From") == 0)
+	{
+	    /* remove "(<conf>@pandora.cz)" */
+	    char *langle = strchr(decoded_value, '<');
+	    if (langle) {
+		char *comment = langle - 2;
+		if (comment > decoded_value && *comment == ')')
+		    while (comment > decoded_value && *comment != '(')
+			comment--;
+		if (comment > decoded_value)
+		    bcopy(langle, comment, strlen(langle)+1);
+	    }
+	}
+
 	header_sofar = (char *)g_hash_table_lookup (message->headers, header);
 	/* we treat the Received: header special - we want to concat ALL of 
 	 * the Received: headers we encounter.
-- 
tg: (417274d..) t/Fix-broken-headers-from-pandora.cz (depends on: master)




More information about the notmuch mailing list