[notmuch] Notmuch's search view sucks
Olly Betts
olly at survex.com
Fri Dec 4 02:36:45 PST 2009
Karl Wiberg writes:
> On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth wrote:
> > And a step beyond that would support different languages for
> > different emails, but that sounds like something "hard" to identify.
>
> But probably not as hard as identifying spam. It could probably be
> done with a simple Bayesian filter counting word frequencies---but
> it'd be much better if somebody else had already solved the problem,
> since this smells suspiciously like something that ought to be a
> separate project and put in a library ... does anyone know if such a
> project already exists?
There's TextCat:
http://www.let.rug.nl/vannoord/TextCat/
It looks at n-gram frequencies, and can guess pretty reliably from
even a fairly small amount of text.
TextCat is in Perl. I don't know if there's a C or C++ implementation
but it isn't a huge piece of code - finding a good technique was the
clever part of it.
Cheers,
Olly
More information about the notmuch
mailing list