[RFC PATCH] Re: excessive thread fusing
David Bremner
david at tethera.net
Sun Apr 20 05:59:26 PDT 2014
Carl Worth <cworth at cworth.org> writes:
>
> Another idea would be to trigger specifically on common forms. Judging
> From the samples in this particular thread, it seems like a workable
> heuristic would be:
>
> If the In-Reply-To header begins with '<':
>
> Parse that initial portion as a message ID
>
> Else if it ends with '>':
>
> Parse that final portion as a message ID
>
> Else
>
> Ignore this garbage-valued header.
>
using the hacky script below, I scanned my own mail collection of about
300k messages. I can make the following observations
- I have some RFC compliant in-reply-to's with multiple ids
- I have have a non-trivial number of Message from $NAME <address> of $date <id>
- I didn't see any cases where using the last angle bracketed thing
would fail.
- I did see some some cases where the header starts with '<' but the
matching '>' was missing
- I also noticed some rfc2047 encoding of in-reply-to headers.
######################################################################
# hacky script follows
dir=$1
echo Scanning $dir
tempdir=$(mktemp -d)
echo Writing to ${tempdir}
find $dir -exec sh -c "formail -c -xIn-reply-to < {}" \; \
> ${tempdir}/ids
sed -e 's/\t/ /' -e 's/ */ /g' -e 's/<[^ ]*>/<id>/g' -e 's/(.*)/(comment)/' < ${tempdir}/ids | sort | uniq | tee ${tempdir}/report
More information about the notmuch
mailing list