[PATCH] tag: Automatically limit to messages whose tags will actually change.
Dmitry Kurochkin
dmitry.kurochkin at gmail.com
Mon Nov 7 20:34:32 PST 2011
Hi Austin.
On Mon, 7 Nov 2011 22:55:23 -0500, Austin Clements <amdragon at MIT.EDU> wrote:
> This optimizes the user's tagging query to exclude messages that won't
> be affected by the tagging operation, saving computation and IO for
> redundant tagging operations.
>
> For example,
> notmuch tag +notmuch to:notmuch at notmuchmail.org
> will now use the query
> ( to:notmuch at notmuchmail.org ) and (not tag:"notmuch")
>
> In the past, we've often suggested that people do this exact
> transformation by hand for slow tagging operations. This makes that
> unnecessary.
Thanks! This is a very useful optimization.
Does it work for multiple tags and tag removal? I.e.:
notmuch tag -inbox -unread +sent from:dmitry.kurochkin at gmail.com
can be converted to:
notmuch tag -inbox -unread +sent from:dmitry.kurochkin at gmail.com and (tag:inbox or tag:unread or (not tag:sent))
Regards,
Dmitry
> ---
> I was about to implement this optimization in my initial tagging
> script, but then I figured, why not just do it in notmuch so we can
> stop telling people to do this by hand?
>
> NEWS | 9 ++++++
> notmuch-tag.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+), 0 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index e00452a..9ca5e0c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -16,6 +16,15 @@ Add search terms to "notmuch dump"
> search/show/tag. The output file argument of dump is deprecated in
> favour of using stdout.
>
> +Optimizations
> +-------------
> +
> +Automatic tag query optimization
> +
> + "notmuch tag" now automatically optimizes the user's query to
> + exclude messages whose tags won't change. In the past, we've
> + suggested that people do this by hand; this is no longer necessary.
> +
> Notmuch 0.9 (2011-10-01)
> ========================
>
> diff --git a/notmuch-tag.c b/notmuch-tag.c
> index dded39e..62c4bf1 100644
> --- a/notmuch-tag.c
> +++ b/notmuch-tag.c
> @@ -30,6 +30,76 @@ handle_sigint (unused (int sig))
> interrupted = 1;
> }
>
> +static char *
> +_escape_tag (char *buf, const char *tag)
> +{
> + const char *in = tag;
> + char *out = buf;
> + /* Boolean terms surrounded by double quotes can contain any
> + * character. Double quotes are quoted by doubling them. */
> + *(out++) = '"';
> + while (*in) {
> + if (*in == '"')
> + *(out++) = '"';
> + *(out++) = *(in++);
> + }
> + *(out++) = '"';
> + *out = 0;
> + return buf;
> +}
> +
> +static char *
> +_optimize_tag_query (void *ctx, const char *orig_query_string, char *argv[],
> + int *add_tags, int add_tags_count,
> + int *remove_tags, int remove_tags_count)
> +{
> + /* This is subtler than it looks. Xapian ignores the '-' operator
> + * at the beginning both queries and parenthesized groups and,
> + * furthermore, the presence of a '-' operator at the beginning of
> + * a group can inhibit parsing of the previous operator. Hence,
> + * the user-provided query MUST appear first, but it is safe to
> + * parenthesize and the exclusion part of the query must not use
> + * the '-' operator (though the NOT operator is fine). */
> +
> + char *escaped, *query_string;
> + const char *join = "";
> + int i;
> + unsigned int max_tag_len = 0;
> +
> + /* Allocate a buffer for escaping tags. */
> + for (i = 0; i < add_tags_count; i++)
> + if (strlen (argv[add_tags[i]] + 1) > max_tag_len)
> + max_tag_len = strlen (argv[add_tags[i]] + 1);
> + for (i = 0; i < remove_tags_count; i++)
> + if (strlen (argv[remove_tags[i]] + 1) > max_tag_len)
> + max_tag_len = strlen (argv[remove_tags[i]] + 1);
> + escaped = talloc_array(ctx, char, max_tag_len * 2 + 3);
> +
> + /* Build the new query string */
> + if (strcmp (orig_query_string, "*") == 0)
> + query_string = talloc_strdup (ctx, "(");
> + else
> + query_string = talloc_asprintf (ctx, "( %s ) and (", orig_query_string);
> +
> + for (i = 0; i < add_tags_count; i++) {
> + query_string = talloc_asprintf_append_buffer (
> + query_string, "%snot tag:%s", join,
> + _escape_tag (escaped, argv[add_tags[i]] + 1));
> + join = " or ";
> + }
> + for (i = 0; i < remove_tags_count; i++) {
> + query_string = talloc_asprintf_append_buffer (
> + query_string, "%stag:%s", join,
> + _escape_tag (escaped, argv[remove_tags[i]] + 1));
> + join = " or ";
> + }
> +
> + query_string = talloc_strdup_append_buffer (query_string, ")");
> +
> + talloc_free (escaped);
> + return query_string;
> +}
> +
> int
> notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
> {
> @@ -93,6 +163,12 @@ notmuch_tag_command (void *ctx, unused (int argc), unused (char *argv[]))
> return 1;
> }
>
> + /* Optimize the query so it excludes messages that already have
> + * the specified set of tags. */
> + query_string = _optimize_tag_query (ctx, query_string, argv,
> + add_tags, add_tags_count,
> + remove_tags, remove_tags_count);
> +
> config = notmuch_config_open (ctx, NULL, NULL);
> if (config == NULL)
> return 1;
> --
> 1.7.7.1
>
> _______________________________________________
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch
More information about the notmuch
mailing list