[PATCH] Remove/replace vertical whitespace in subject header field body.

James Vasile james at hackervisions.org
Wed Mar 16 18:44:28 PDT 2011


RFC 822 specifies that headers are one-liners of ASCII:

> The field-body may be composed of any ASCII characters, except CR or
> LF.  (While CR and/or LF may be present in the actual text, they are
> removed by the action of unfolding the field.)

RFC 5335 allows UTF-8 in header field bodies, but as I read the docs,
the RFC 822 specification that they end up as one-liners still applies.

RFC 5322 describes folding and unfolding as follows:

> Each header field is logically a single line of characters comprising
> the field name, the colon, and the field body. For convenience
> however, and to deal with the 998/78 character limitations per line,
> the field body portion of a header field can be split into a
> multiple-line representation; this is called "folding". The general
> rule is that wherever this specification allows for folding white
> space (not simply WSP characters), a CRLF may be inserted before any
> WSP.
...
> The process of moving from this folded multiple-line representation of
> a header field to its single line representation is called
> "unfolding". Unfolding is accomplished by simply removing any CRLF
> that is immediately followed by WSP.

Again, unfolded subjects should be one-liners.

An email was sent to me from pingg.com (I think it's a pretentious
version of evite) came with a subject of
"=?utf-8?Q?bring_small_items_for_a_pi=C3=B1ata=21=21=21=21=0A?=", which
"notmuch search" displays as "Subject: bring small items for a
piñata!!!!" with a \n at the end.  This befuddles the emacs UI ("Error:
Unexpected output from notmuch search:").  I've attached an email that
reproduces the error.

I don't think ending the subject with a utf-8-encoded 0x0A followed by
the usual CRLF is RFC-compliant.  Still, notmuch should surely follow
the deplorable "accept liberally/emit conservatively" doctrine.

Here is a patch that trims leading and trailing whitespace from subjects
and replaces internal non-space, non-horizontal-tab whitespace with
spaces.  It fixes the problem described in this message.
---
 lib/thread.cc |   36 ++++++++++++++++++++++++++++++++----
 1 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/lib/thread.cc b/lib/thread.cc
index 5190a66..7a816ea 100644
--- a/lib/thread.cc
+++ b/lib/thread.cc
@@ -266,6 +266,34 @@ _thread_add_message (notmuch_thread_t *thread,
     }
 }
 
+/* Remove leading/trailing whitespace and replace internal vertical
+ * whitespace with spaces.
+ */
+static char *
+rectify_whitespace (char *str)
+{
+  char *last;
+  char *curr;
+
+  while (isspace (*str))
+    str++;
+
+  if (*str == 0)
+    return str;
+
+  last = str + strlen(str) - 1;
+  while (last > str && isspace (*last))
+    last--;
+
+  curr = str;
+  do
+    if ((*curr >= 10) && (*curr <= 13))
+      *curr = 32; //space
+  while (curr++ < last);
+
+  return str;
+}
+
 static void
 _thread_set_subject_from_message (notmuch_thread_t *thread,
 				  notmuch_message_t *message)
@@ -282,11 +310,11 @@ _thread_set_subject_from_message (notmuch_thread_t *thread,
 	(strncasecmp (subject, "Vs: ", 4) == 0) ||
 	(strncasecmp (subject, "Sv: ", 4) == 0)) {
 
-	cleaned_subject = talloc_strndup (thread,
-					  subject + 4,
-					  strlen(subject) - 4);
+      cleaned_subject = rectify_whitespace(talloc_strndup (thread,
+							   subject + 4,
+							   strlen(subject) - 4));
     } else {
-	cleaned_subject = talloc_strdup (thread, subject);
+      cleaned_subject = rectify_whitespace(talloc_strdup (thread, subject));
     }
 
     if (thread->subject)
-- 
1.7.2.3



-------------- next part --------------
A non-text attachment was scrubbed...
Name: malformed_subject
Type: application/octet-stream
Size: 351 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110316/d51114bb/attachment.obj>


More information about the notmuch mailing list