[RFC PATCH] Re: excessive thread fusing

David Bremner david at tethera.net
Sun Apr 20 05:59:26 PDT 2014


Carl Worth <cworth at cworth.org> writes:
>
> Another idea would be to trigger specifically on common forms. Judging
> From the samples in this particular thread, it seems like a workable
> heuristic would be:
>
> 	If the In-Reply-To header begins with '<':
>
> 		Parse that initial portion as a message ID
>
> 	Else if it ends with '>':
>
> 		Parse that final portion as a message ID
>
> 	Else
>
> 		Ignore this garbage-valued header.
>

using the hacky script below, I scanned my own mail collection of about
300k messages. I can make the following observations

- I have some RFC compliant in-reply-to's with multiple ids
- I have have a non-trivial number of Message from $NAME <address> of $date <id>
- I didn't see any cases where using the last angle bracketed thing
  would fail.
- I did see some some cases where the header starts with '<' but the
  matching '>' was missing
- I also noticed some rfc2047 encoding of in-reply-to headers.


######################################################################
# hacky script follows
dir=$1
echo Scanning $dir

tempdir=$(mktemp -d)
echo Writing to ${tempdir}

find $dir -exec sh -c "formail -c -xIn-reply-to < {}" \; \
  > ${tempdir}/ids

sed  -e 's/\t/ /' -e 's/   */ /g' -e 's/<[^ ]*>/<id>/g' -e 's/(.*)/(comment)/' < ${tempdir}/ids | sort | uniq | tee ${tempdir}/report


More information about the notmuch mailing list