[Patch v4 2/2] test: initial performance testing infrastructure

Austin Clements amdragon at MIT.EDU
Sun Nov 25 19:29:06 PST 2012


Quoth David Bremner on Nov 25 at  8:05 pm:
> Austin Clements <amdragon at MIT.EDU> writes:
> >> +add_email_corpus takes arguments "--small" and "--medium" for when you
> >> +want smaller corpuses to check.
> >
> > "corpora"?
> 
> reworded to say 
> 
> ,----
> | add_email_corpus takes arguments "--small" and "--medium" for when you
> | want smaller subsets of the corpus to check.
> `----

That's clearer.

> >
> > I'm a bit confused by this.  What happens if you don't specify --small
> > or --medium?  Is the "large"/default corpus just the combined small
> > and medium corpora?  Would be worth a comment, at least.
> 
> Hopefully the README makes this clear(er) now?

The README definitely helps.  Might still be worth a comment in the
code since it took me some thinking to realize it would do something
reasonable when given no argument.  Perhaps above the initial
assignment of arg,

# With no argument, use the entire (combined) corpus

to acknowledge that this is a legitimate and intentional code path?

> > This probably doesn't matter now, but I wonder if we want to unpack on
> > first use to somewhere not test-specific and then cp -rl the corpus
> > into the test directory.  I haven't tried unpacking the corpus yet,
> > but if you're running tests repeatedly to compare results, or running
> > more than one performance test, it seems like a full decompress and
> > unpack could get onerous.
> 
> Hmm. On my machine it is 10s for the copy versus 45s for a full
> unpack. For some reason I tested with "cp -a" which is incredibly slow, 
> so I thought there was no loss. For comparison the basic test takes
> about 10 minutes on the same machine.
> 
> In any case this can wait until we have a second test file and a second
> call to add_mail_corpus, adding caching now would not help.

It would help (a little) if you run basic multiple times.  I think
it's completely reasonable to leave it as is for now and see if
caching would help down the road.


More information about the notmuch mailing list