v9 of batch tagging

david at tethera.net david at tethera.net
Sun Dec 23 17:39:26 PST 2012


This obsoletes 

     id:1356095307-22895-1-git-send-email-david at tethera.net

The main changes since v8 are the rebasing against the notmuch-restore
fixes in master, and the rewrite of the query (pre)-processing
unhex_and_quote. This incorporates the changes of

      id:1356231570-28232-1-git-send-email-david at tethera.net

and  now handles '()'  (cf. id:87a9t5p4dz.fsf at qmul.ac.uk)

With respect to 

,----
| Finally, I don't know if a query can contain a : without being a
| prefix query. If it can that could end up being misquoted.
`----

This is pretty easy to work around by encoding that :. I think unless
it is a problem in practice I prefer not to keep an explicity list of
prefixes here; recognizing prefixes should really be a service from
libnotmuch.

I dropped two patches (strnspn and hex_invariant), but picked up a new
strtok variation. Probably the name strtok_len2 could be improved
(and I see there is a typo in the patch subject).

 [Patch v9 05/17] util/string-util: add a new string tokenized

Finally I added a test for the new parenthesis handling.

[Patch v9 17/17] test/tagging: add test for handling of parens

Fixup wise, the tests needed to be adjusted a bit for () being delimiters, 
and the man page as well.

I added the fclose in id:87wqw9hf9a.fsf at oiva.home.nikula.org

And I modified the return value per id:87zk15hi7f.fsf at oiva.home.nikula.org

Here is the interdiff for unhex_and_quote:

commit 67c6aee87db5c7da25529e1c0feb64e422abb4b7
Author: David Bremner <bremner at unb.ca>
Date:   Sat Dec 22 22:49:02 2012 -0400

    simplify unhex_and_quote, support parens
    
    the overgeneral definition of a prefix can be replaced by lower case
    alphabetic, and still work fine with current notmuch query syntax.
    
    use () as delimiters in unhex_and_quote, preserve delimiters

diff --git a/tag-util.c b/tag-util.c
index 6f62fe6..91f3603 100644
--- a/tag-util.c
+++ b/tag-util.c
@@ -56,6 +56,21 @@ illegal_tag (const char *tag, notmuch_bool_t remove)
     return NULL;
 }
 
+/* Factor out the boilerplate to append a token to the query string.
+ * For use in unhex_and_quote */
+
+static tag_parse_status_t
+append_tok (const char *tok, size_t tok_len,
+	    const char *line_for_error, char **query_string)
+{
+
+    *query_string = talloc_strndup_append_buffer (*query_string, tok, tok_len);
+    if (*query_string == NULL)
+	return line_error (TAG_PARSE_OUT_OF_MEMORY, line_for_error, "aborting");
+
+    return TAG_PARSE_SUCCESS;
+}
+
 /* Input is a hex encoded string, presumed to be a query for Xapian.
  *
  * Space delimited tokens are decoded and quoted, with '*' and prefixes
@@ -67,45 +82,41 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error,
 {
     char *tok = encoded;
     size_t tok_len = 0;
+    size_t delim_len = 0;
     char *buf = NULL;
     size_t buf_len = 0;
     tag_parse_status_t ret = TAG_PARSE_SUCCESS;
 
     *query_string = talloc_strdup (ctx, "");
 
-    while ((tok = strtok_len (tok + tok_len, " ", &tok_len)) != NULL) {
+    while ((tok = strtok_len2 (tok + tok_len + delim_len, " ()",
+			       &tok_len, &delim_len)) != NULL) {
 
 	size_t prefix_len;
 	char delim = *(tok + tok_len);
 
-	*(tok + tok_len++) = '\0';
+	*(tok + tok_len) = '\0';
 
-	prefix_len = hex_invariant (tok, tok_len);
+	/* The following matches a superset of prefixes currently
+	 * used by notmuch */
+	prefix_len = strspn (tok, "abcdefghijklmnopqrstuvwxyz");
 
-	if ((strcmp (tok, "*") == 0) || prefix_len >= tok_len - 1) {
+	if ((strcmp (tok, "*") == 0) || prefix_len == tok_len) {
 
 	    /* pass some things through without quoting or decoding.
 	     * Note for '*' this is mandatory.
 	     */
 
-	    if (! (*query_string = talloc_asprintf_append_buffer (
-		       *query_string, "%s%c", tok, delim))) {
-
-		ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
-				  line_for_error, "aborting");
-		goto DONE;
-	    }
+	    ret = append_tok (tok, tok_len, line_for_error, query_string);
+	    if (ret) goto DONE;
 
 	} else {
 	    /* potential prefix: one for ':', then something after */
-	    if ((tok_len - prefix_len > 2) && *(tok + prefix_len) == ':') {
-		if (! (*query_string = talloc_strndup_append (*query_string,
-							      tok,
-							      prefix_len + 1))) {
-		    ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
-				      line_for_error, "aborting");
-		    goto DONE;
-		}
+	    if ((tok_len - prefix_len >= 2) && *(tok + prefix_len) == ':') {
+		ret = append_tok (tok, prefix_len + 1,
+				  line_for_error, query_string);
+		if (ret) goto DONE;
+
 		tok += prefix_len + 1;
 		tok_len -= prefix_len + 1;
 	    }
@@ -122,13 +133,15 @@ unhex_and_quote (void *ctx, char *encoded, const char *line_for_error,
 		goto DONE;
 	    }
 
-	    if (! (*query_string = talloc_asprintf_append_buffer (
-		       *query_string, "%s%c", buf, delim))) {
-		ret = line_error (TAG_PARSE_OUT_OF_MEMORY,
-				  line_for_error, "aborting");
-		goto DONE;
-	    }
+	    ret = append_tok (buf, buf_len, line_for_error, query_string);
+	    if (ret) goto DONE;
 	}
+	/* restore the string */
+	*(tok + tok_len) = delim;
+
+	/* copy any delimiters */
+	ret = append_tok (tok + tok_len, delim_len, line_for_error, query_string);
+	if (ret) goto DONE;
     }
 
   DONE:



More information about the notmuch mailing list