RFC: adding larger test corpus, switching to xz

David Bremner david at tethera.net
Thu Apr 13 18:07:36 PDT 2017


David Bremner <david at tethera.net> writes:

> I currently have some WIP code that passes all tests with our default
> corpus, but fails with the smallest performance corpus. The simplest
> thing to do would be to add a small sample from our performance corpus
> as one for our standard (correctness) suite. I'm currently looking at
> 146 LKML messages. Unpacked these are about 1.3M; they bloat the source
> tarball by about 285K, which is large in relative terms (about 40%), but
> small in absolute terms for most modern systems. If we switch to xz
> compression, the resulting tarball is only 711K.
>

In the end I found 210 messages (1 thread of 100, one of 48, assorted
smaller threads) that only bloated the source by 161k, so that I decided
to add the corpus. It's not used yet in the test suite, but it is needed
by a series I will post soon.


More information about the notmuch mailing list