[Patch v3 5/6] test: add generator for random "stub" messages
Tomi Ollila
tomi.ollila at iki.fi
Sat Sep 8 06:38:35 PDT 2012
On Sun, Aug 19 2012, david at tethera.net wrote:
> From: David Bremner <bremner at debian.org>
>
> Initial use case is testing dump and restore, so we only have
> message-ids and tags.
>
> The message ID's are nothing like RFC compliant, but it doesn't seem
> any harder to roundtrip random UTF-8 strings than RFC-compliant ones.
>
> Tags are UTF-8, even though notmuch is in principle more generous than
> that.
> ---
Mostly LGTM (the whole series). Few comments inline...
Finally, 6/6 adds known broken test -- when will we see this code
taken into use and the broken test fixed :)
> test/.gitignore | 1 +
> test/Makefile.local | 9 +++
> test/basic | 2 +-
> test/random-corpus.c | 202 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 213 insertions(+), 1 deletion(-)
> create mode 100644 test/random-corpus.c
[ ... ]
>
> diff --git a/test/random-corpus.c b/test/random-corpus.c
> new file mode 100644
> index 0000000..8c5b559
> --- /dev/null
> +++ b/test/random-corpus.c
[ ... ]
> +
> +/* Current largest UTF-32 value defined. Note that most of these will
> + * be printed as boxes in most fonts.
> + */
Should we be talking about UTF-8 valies. UTF-8 (currently has the same
limit).
> +
> +#define GLYPH_MAX 0x10FFFE
> +
> +static gunichar
> +random_unichar ()
> +{
> + int start = 1, stop = GLYPH_MAX;
> + int class = random() % 2;
> +
> + /*
> + * Choose about half ascii as test characters, as ascii
> + * punctation and whitespace is the main cause of problems for
> + * the (old) restore parser
> + */
> + switch (class) {
> + case 0:
> + /* ascii */
> + start = 0x01;
> + stop = 0x7f;
> + break;
> + case 1:
> + /* the rest of unicode */
> + start = 0x80;
> + stop = GLYPH_MAX;
> + }
> +
> + if (start == stop)
> + return start;
> + else
> + return start + (random() % (stop - start + 1));
> +}
> +
> +static char *
> +random_utf8_string (void *ctx, size_t char_count)
> +{
> +
> + gchar *buf = NULL;
> + size_t buf_size = 0;
> +
> + size_t offset = 0;
> +
> + size_t i;
> +
> + buf = talloc_realloc (ctx, NULL, gchar, char_count);
> + buf_size = char_count;
> +
> + for (i = 0; i < char_count; i++) {
> + gunichar randomchar;
> + size_t written;
> +
> + /* 6 for one glyph, one for null */
> + if (buf_size - offset < 8) {
> + buf_size += 16;
> + buf = talloc_realloc (ctx, buf, gchar, buf_size);
This reallocation will hit many times, as originally there was just
char_count bytes allocated -- this limit will probably get hit before
halfway the creation of random string (half uses 1 byte, other half
2, 3 or 4 bytes, mostly 4 (even only half of the 4-byte range is used...)
Maybe originally allocating char_count * 2 + 8 and if realloc required
(char_count - i) * 2 + 8... or maybe better, just doing the latter
realloc and replacing first with buf = NULL; buf_size = 0;
Alternatively you could play with random states; calculate size,
reset random state, alloc size + 1 and write chars.
> + }
> +
> + randomchar = random_unichar();
> +
> + written = g_unichar_to_utf8 (randomchar, buf + offset);
> +
> + if (written <= 0) {
> + fprintf (stderr, "error converting to utf8\n");
> + exit (1);
> + }
> +
> + offset += written;
> +
> + }
Above there is extra newline. There are a few others in other
files (at least after opening and before closing brace).
Maybe uncrustify your source :)
> + buf[offset] = 0;
> + return buf;
> +}
> +
[ ... ]
Tomi
More information about the notmuch
mailing list