[PATCH] Remove/replace vertical whitespace in subject header field body.
James Vasile
james at hackervisions.org
Wed Mar 16 18:44:28 PDT 2011
RFC 822 specifies that headers are one-liners of ASCII:
> The field-body may be composed of any ASCII characters, except CR or
> LF. (While CR and/or LF may be present in the actual text, they are
> removed by the action of unfolding the field.)
RFC 5335 allows UTF-8 in header field bodies, but as I read the docs,
the RFC 822 specification that they end up as one-liners still applies.
RFC 5322 describes folding and unfolding as follows:
> Each header field is logically a single line of characters comprising
> the field name, the colon, and the field body. For convenience
> however, and to deal with the 998/78 character limitations per line,
> the field body portion of a header field can be split into a
> multiple-line representation; this is called "folding". The general
> rule is that wherever this specification allows for folding white
> space (not simply WSP characters), a CRLF may be inserted before any
> WSP.
...
> The process of moving from this folded multiple-line representation of
> a header field to its single line representation is called
> "unfolding". Unfolding is accomplished by simply removing any CRLF
> that is immediately followed by WSP.
Again, unfolded subjects should be one-liners.
An email was sent to me from pingg.com (I think it's a pretentious
version of evite) came with a subject of
"=?utf-8?Q?bring_small_items_for_a_pi=C3=B1ata=21=21=21=21=0A?=", which
"notmuch search" displays as "Subject: bring small items for a
piñata!!!!" with a \n at the end. This befuddles the emacs UI ("Error:
Unexpected output from notmuch search:"). I've attached an email that
reproduces the error.
I don't think ending the subject with a utf-8-encoded 0x0A followed by
the usual CRLF is RFC-compliant. Still, notmuch should surely follow
the deplorable "accept liberally/emit conservatively" doctrine.
Here is a patch that trims leading and trailing whitespace from subjects
and replaces internal non-space, non-horizontal-tab whitespace with
spaces. It fixes the problem described in this message.
---
lib/thread.cc | 36 ++++++++++++++++++++++++++++++++----
1 files changed, 32 insertions(+), 4 deletions(-)
diff --git a/lib/thread.cc b/lib/thread.cc
index 5190a66..7a816ea 100644
--- a/lib/thread.cc
+++ b/lib/thread.cc
@@ -266,6 +266,34 @@ _thread_add_message (notmuch_thread_t *thread,
}
}
+/* Remove leading/trailing whitespace and replace internal vertical
+ * whitespace with spaces.
+ */
+static char *
+rectify_whitespace (char *str)
+{
+ char *last;
+ char *curr;
+
+ while (isspace (*str))
+ str++;
+
+ if (*str == 0)
+ return str;
+
+ last = str + strlen(str) - 1;
+ while (last > str && isspace (*last))
+ last--;
+
+ curr = str;
+ do
+ if ((*curr >= 10) && (*curr <= 13))
+ *curr = 32; //space
+ while (curr++ < last);
+
+ return str;
+}
+
static void
_thread_set_subject_from_message (notmuch_thread_t *thread,
notmuch_message_t *message)
@@ -282,11 +310,11 @@ _thread_set_subject_from_message (notmuch_thread_t *thread,
(strncasecmp (subject, "Vs: ", 4) == 0) ||
(strncasecmp (subject, "Sv: ", 4) == 0)) {
- cleaned_subject = talloc_strndup (thread,
- subject + 4,
- strlen(subject) - 4);
+ cleaned_subject = rectify_whitespace(talloc_strndup (thread,
+ subject + 4,
+ strlen(subject) - 4));
} else {
- cleaned_subject = talloc_strdup (thread, subject);
+ cleaned_subject = rectify_whitespace(talloc_strdup (thread, subject));
}
if (thread->subject)
--
1.7.2.3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: malformed_subject
Type: application/octet-stream
Size: 351 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20110316/d51114bb/attachment.obj>
More information about the notmuch
mailing list