v9 of batch tagging
Mark Walters
markwalters1009 at gmail.com
Sun Dec 23 18:34:33 PST 2012
On Mon, 24 Dec 2012, david at tethera.net wrote:
> This obsoletes
>
> id:1356095307-22895-1-git-send-email-david at tethera.net
>
> The main changes since v8 are the rebasing against the notmuch-restore
> fixes in master, and the rewrite of the query (pre)-processing
> unhex_and_quote. This incorporates the changes of
>
> id:1356231570-28232-1-git-send-email-david at tethera.net
>
> and now handles '()' (cf. id:87a9t5p4dz.fsf at qmul.ac.uk)
>
> With respect to
>
> ,----
> | Finally, I don't know if a query can contain a : without being a
> | prefix query. If it can that could end up being misquoted.
> `----
>
> This is pretty easy to work around by encoding that :. I think unless
> it is a problem in practice I prefer not to keep an explicity list of
> prefixes here; recognizing prefixes should really be a service from
> libnotmuch.
I am quite happy with this.
> I dropped two patches (strnspn and hex_invariant), but picked up a new
> strtok variation. Probably the name strtok_len2 could be improved
> (and I see there is a typo in the patch subject).
>
> [Patch v9 05/17] util/string-util: add a new string tokenized
>
Patches 5 and 6 look good to me.
> Finally I added a test for the new parenthesis handling.
My recollection is that dump prints the messages unsorted: does this
mean that we could get unstable results for these tests (eg with
different Xapian versions)?
Best wishes
Mark
>
> [Patch v9 17/17] test/tagging: add test for handling of parens
>
> Fixup wise, the tests needed to be adjusted a bit for () being delimiters,
> and the man page as well.
>
> I added the fclose in id:87wqw9hf9a.fsf at oiva.home.nikula.org
>
> And I modified the return value per id:87zk15hi7f.fsf at oiva.home.nikula.org
>
> Here is the interdiff for unhex_and_quote:
>
> commit 67c6aee87db5c7da25529e1c0feb64e422abb4b7
> Author: David Bremner <bremner at unb.ca>
> Date: Sat Dec 22 22:49:02 2012 -0400
>
> simplify unhex_and_quote, support parens
>
> the overgeneral definition of a prefix can be replaced by lower case
> alphabetic, and still work fine with current notmuch query syntax.
>
> use () as delimiters in unhex_and_quote, preserve delimiters
>
> diff --git a/tag-util.c b/tag-util.c
> index 6f62fe6..91f3603 100644
> --- a/tag-util.c
> +++ b/tag-util.c
> @@ -56,6 +56,21 @@ illegal_tag (const char *tag, notmuch_bool_t remove)
> return NULL;
> }
>
> +/* Factor out the boilerplate to append a token to the query string.
> + * For use in unhex_and_quote */
> +
> +static tag_parse_status_t
> +append_tok (const char *tok, size_t tok_len,
> + const char *line_for_error, char **query_string)
> +{
> +
> + *query_string = talloc_strndup_append_buffer (*query_string, tok, tok_len);
> + if (*query_string == NULL)
> + return line_error (TAG_PARSE_OUT_OF_MEMORY, line_for_error, "aborting");
> +
> + return TAG_PARSE_SUCCESS;
> +}
> +
> /* Input is a hex encoded string, presumed to be a query for Xapian.
> *
> * Space delimited tokens are decoded and quoted, with '*' and prefixes
> @@ -67,45 +82,41 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error,
> {
> char *tok = encoded;
> size_t tok_len = 0;
> + size_t delim_len = 0;
> char *buf = NULL;
> size_t buf_len = 0;
> tag_parse_status_t ret = TAG_PARSE_SUCCESS;
>
> *query_string = talloc_strdup (ctx, "");
>
> - while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {
> + while ((tok = strtok_len2 (tok + tok_len + delim_len, " ()",
> + &tok_len, &delim_len)) != NULL) {
>
> size_t prefix_len;
> char delim = *(tok + tok_len);
>
> - *(tok + tok_len++) = '\0';
> + *(tok + tok_len) = '\0';
>
> - prefix_len = hex_invariant (tok, tok_len);
> + /* The following matches a superset of prefixes currently
> + * used by notmuch */
> + prefix_len = strspn (tok, "abcdefghijklmnopqrstuvwxyz");
>
> - if ((strcmp (tok, "*") == 0) || prefix_len >= tok_len - 1) {
> + if ((strcmp (tok, "*") == 0) || prefix_len == tok_len) {
>
> /* pass some things through without quoting or decoding.
> * Note for '*' this is mandatory.
> */
>
> - if (! (*query_string = talloc_asprintf_append_buffer (
> - *query_string, "%s%c", tok, delim))) {
> -
> - ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
> - line_for_error, "aborting");
> - goto DONE;
> - }
> + ret = append_tok (tok, tok_len, line_for_error, query_string);
> + if (ret) goto DONE;
>
> } else {
> /* potential prefix: one for ':', then something after */
> - if ((tok_len - prefix_len > 2) && *(tok + prefix_len) == ':') {
> - if (! (*query_string = talloc_strndup_append (*query_string,
> - tok,
> - prefix_len + 1))) {
> - ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
> - line_for_error, "aborting");
> - goto DONE;
> - }
> + if ((tok_len - prefix_len >= 2) && *(tok + prefix_len) == ':') {
> + ret = append_tok (tok, prefix_len + 1,
> + line_for_error, query_string);
> + if (ret) goto DONE;
> +
> tok += prefix_len + 1;
> tok_len -= prefix_len + 1;
> }
> @@ -122,13 +133,15 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error,
> goto DONE;
> }
>
> - if (! (*query_string = talloc_asprintf_append_buffer (
> - *query_string, "%s%c", buf, delim))) {
> - ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
> - line_for_error, "aborting");
> - goto DONE;
> - }
> + ret = append_tok (buf, buf_len, line_for_error, query_string);
> + if (ret) goto DONE;
> }
> + /* restore the string */
> + *(tok + tok_len) = delim;
> +
> + /* copy any delimiters */
> + ret = append_tok (tok + tok_len, delim_len, line_for_error, query_string);
> + if (ret) goto DONE;
> }
>
> DONE:
>
> _______________________________________________
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
More information about the notmuch
mailing list