[PATCH] tag: Automatically limit to messages whose tags will actually change.

Jani Nikula jani at nikula.org
Wed Nov 9 00:46:02 PST 2011


FWIW, I reviewed this and didn't find any obvious problems. A few
nitpicks below, though.

BR,
Jani.

On Mon,  7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon at MIT.EDU> wrote:
> This optimizes the user's tagging query to exclude messages that won't
> be affected by the tagging operation, saving computation and IO for
> redundant tagging operations.
> 
> For example,
>   notmuch tag +notmuch to:notmuch at notmuchmail.org
> will now use the query
>   ( to:notmuch at notmuchmail.org ) and (not tag:"notmuch")
> 
> In the past, we've often suggested that people do this exact
> transformation by hand for slow tagging operations.  This makes that
> unnecessary.
> ---
> I was about to implement this optimization in my initial tagging
> script, but then I figured, why not just do it in notmuch so we can
> stop telling people to do this by hand?
> 
>  NEWS          |    9 ++++++
>  notmuch-tag.c |   76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+), 0 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index e00452a..9ca5e0c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -16,6 +16,15 @@ Add search terms to  "notmuch dump"
>    search/show/tag. The output file argument of dump is deprecated in
>    favour of using stdout.
>  
> +Optimizations
> +-------------
> +
> +Automatic tag query optimization
> +
> +  "notmuch tag" now automatically optimizes the user's query to
> +  exclude messages whose tags won't change.  In the past, we've
> +  suggested that people do this by hand; this is no longer necessary.
> +
>  Notmuch 0.9 (2011-10-01)
>  ========================
>  
> diff --git a/notmuch-tag.c b/notmuch-tag.c
> index dded39e..62c4bf1 100644
> --- a/notmuch-tag.c
> +++ b/notmuch-tag.c
> @@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
>      interrupted = 1;
>  }
>  
> +static char *
> +_escape_tag (char *buf, const char *tag)
> +{
> +    const char *in = tag;
> +    char *out = buf;
> +    /* Boolean terms surrounded by double quotes can contain any
> +     * character.  Double quotes are quoted by doubling them. */
> +    *(out++) = '"';
> +    while (*in) {
> +	if (*in == '"')
> +	    *(out++) = '"';
> +	*(out++) = *(in++);
> +    }
> +    *(out++) = '"';

The parenthesis are unnecessary for *p++.

> +    *out = 0;
> +    return buf;
> +}
> +
> +static char *
> +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
> +		     int *add_tags, int add_tags_count,
> +		     int *remove_tags, int remove_tags_count)
> +{
> +    /* This is subtler than it looks.  Xapian ignores the '-' operator
> +     * at the beginning both queries and parenthesized groups and,
> +     * furthermore, the presence of a '-' operator at the beginning of
> +     * a group can inhibit parsing of the previous operator.  Hence,
> +     * the user-provided query MUST appear first, but it is safe to
> +     * parenthesize and the exclusion part of the query must not use
> +     * the '-' operator (though the NOT operator is fine). */
> +
> +    char *escaped, *query_string;
> +    const char *join = "";
> +    int i;
> +    unsigned int max_tag_len = 0;
> +
> +    /* Allocate a buffer for escaping tags. */
> +    for (i = 0; i < add_tags_count; i++)
> +	if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
> +	    max_tag_len = strlen (argv[add_tags[i]] + 1);
> +    for (i = 0; i < remove_tags_count; i++)
> +	if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
> +	    max_tag_len = strlen (argv[remove_tags[i]] + 1);
> +    escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);

Perhaps a comment here or above _escape_tag() explaining the worst case
memory consumption of strlen(tag) * 2 + 3 for a tag of "s would be in
order.

It's unrelated, but looking at the above also made me check something
I've suspected before: notmuch allows you to have empty or zero length
tags "", which is probably not intentional.

There's no check for talloc failures here or below. But then there are
few checks for that in the cli in general. *shrug*.

> +
> +    /* Build the new query string */
> +    if (strcmp (orig_query_string, "*") == 0)
> +	query_string = talloc_strdup (ctx, "(");
> +    else
> +	query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
> +
> +    for (i = 0; i < add_tags_count; i++) {
> +	query_string = talloc_asprintf_append_buffer (
> +	    query_string, "%snot tag:%s", join,
> +	    _escape_tag (escaped, argv[add_tags[i]] + 1));
> +	join = " or ";
> +    }
> +    for (i = 0; i < remove_tags_count; i++) {
> +	query_string = talloc_asprintf_append_buffer (
> +	    query_string, "%stag:%s", join,
> +	    _escape_tag (escaped, argv[remove_tags[i]] + 1));
> +	join = " or ";
> +    }
> +
> +    query_string = talloc_strdup_append_buffer (query_string, ")");
> +
> +    talloc_free (escaped);
> +    return query_string;
> +}
> +
>  int
>  notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
>  {
> @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
>  	return 1;
>      }
>  
> +    /* Optimize the query so it excludes messages that already have
> +     * the specified set of tags. */
> +    query_string = _optimize_tag_query (ctx, query_string, argv,
> +					add_tags, add_tags_count,
> +					remove_tags, remove_tags_count);
> +
>      config = notmuch_config_open (ctx, NULL, NULL);
>      if (config == NULL)
>  	return 1;
> -- 
> 1.7.7.1
> 
> _______________________________________________
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch


More information about the notmuch mailing list