[PATCH 1/2] Convert non-UTF-8 parts to UTF-8 before indexing them

Austin Clements amdragon at MIT.EDU
Fri Feb 24 20:33:12 PST 2012


LGTM.  I'm assuming this interacts with the uuencoding filter in the
right order (I don't see how any other order could be correct), but
don't actually know.

Quoth Michal Sojka on Feb 24 at  8:36 am:
> This fixes a bug that didn't allow to search for non-ASCII words such
> parts. The code here was copied from show_text_part_content(), because
> the show command already does the needed conversion when showing the
> message.
> ---
>  lib/index.cc |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/lib/index.cc b/lib/index.cc
> index d8f8b2b..e377732 100644
> --- a/lib/index.cc
> +++ b/lib/index.cc
> @@ -315,6 +315,7 @@ _index_mime_part (notmuch_message_t *message,
>      GByteArray *byte_array;
>      GMimeContentDisposition *disposition;
>      char *body;
> +    const char *charset;
>  
>      if (! part) {
>  	fprintf (stderr, "Warning: Not indexing empty mime part.\n");
> @@ -390,6 +391,20 @@ _index_mime_part (notmuch_message_t *message,
>      g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter),
>  			      discard_uuencode_filter);
>  
> +    charset = g_mime_object_get_content_type_parameter (part, "charset");
> +    if (charset) {
> +	GMimeFilter *charset_filter;
> +	charset_filter = g_mime_filter_charset_new (charset, "UTF-8");
> +	/* This result can be NULL for things like "unknown-8bit".
> +	 * Don't set a NULL filter as that makes GMime print
> +	 * annoying assertion-failure messages on stderr. */
> +	if (charset_filter) {
> +	    g_mime_stream_filter_add (GMIME_STREAM_FILTER (filter),
> +				      charset_filter);
> +	    g_object_unref (charset_filter);
> +	}
> +    }
> +
>      wrapper = g_mime_part_get_content_object (GMIME_PART (part));
>      if (wrapper)
>  	g_mime_data_wrapper_write_to_stream (wrapper, filter);


More information about the notmuch mailing list