[notmuch] Notmuch's search view sucks

Olly Betts olly at survex.com
Fri Dec 4 02:36:45 PST 2009


Karl Wiberg writes:
> On Fri, Dec 4, 2009 at 1:29 AM, Carl Worth wrote:
> > And a step beyond that would support different languages for
> > different emails, but that sounds like something "hard" to identify.
> 
> But probably not as hard as identifying spam. It could probably be
> done with a simple Bayesian filter counting word frequencies---but
> it'd be much better if somebody else had already solved the problem,
> since this smells suspiciously like something that ought to be a
> separate project and put in a library ... does anyone know if such a
> project already exists?

There's TextCat:

http://www.let.rug.nl/vannoord/TextCat/

It looks at n-gram frequencies, and can guess pretty reliably from
even a fairly small amount of text.

TextCat is in Perl.  I don't know if there's a C or C++ implementation
but it isn't a huge piece of code - finding a good technique was the
clever part of it.

Cheers,
    Olly



More information about the notmuch mailing list