[PATCH v2 2/4] cli: Extend the search command for --output=addresses and similar

Thu Oct 9 03:55:38 PDT 2014

On Mon, Oct 06 2014, Tomi Ollila wrote:
> On Sun, Oct 05 2014, Michal Sojka <sojkam1 at fel.cvut.cz> wrote:
>
>> The new outputs allow printing senders, recipients or both of matching
>> messages.
>>
>> This code based on a patch from Jani Nikula.
>
> OK, IMO...
>
> 1/4 OK
>
> Before 2/4 add support for 'flag' arguments, drop the --output=addresses
> option which is now done as --output=sender --output=recipients

OK

> In deduplication comment did not describe the deduplication at all...
> so I looked a bit into the code now... the Default you described was
> that with "John Doe" <john.doe at example.com> and "John Doe" <JOHN.DOE at EXAMPLE.COM>
> only one was printed (but not which one).

I intentionally didn't want to define which one, but I agree that it
might be useful in same cases. It would depend on --sort option and on
the order of addresses in email headers.

> Secondly, what happens with "Doe, John" <john.doe at example.com> and
> "John Doe" <JOHN.DOE at EXAMPLE.COM>... ah, it is same as *addr* with
> case-insensitive address.
>
> Sorry, but IMO these options are a bit strange.

My impression is that I did bad job describing the deduplication
algorithm, which is why you don't understand it. Maybe, we can also
change the name of the option to --filter-by, or something like this.

When thinking about how to best document such an option, it seems that
the user must be aware that this is implemented as flags that are ORed.
Which means that the default should be what was in the previous patch
--unique=none.

What about the following?

    ``--filter-flag=``\ (**addr**\ \|\ **name**\ \|\ **addrfold**)

        Can be used with ``--output=addresses``, ``--output=sender`` or
        ``--output=recipients`` to filter out duplicate addresses. The
        filtering algorithm receives a sequence of email addresses and
        outputs the same sequence without the addresses that are
        considered a duplicate of a previously output address. What is
        considered a duplicate depends on the flags given:

	**addr** means that the address part is compared.
	Case-sensitivity can be controlled by **addrfold** flag (see
	below). For example, the addresses "John Doe <john at example.com>"
	and "Dr. John Doe <john at example.com>" will be considered
	duplicate.

	**name** means that the name part is compared in case-sensitive
	manner. For example, the addresses "John Doe <john at example.com>"
	and "John Doe <john at doe.name>" will be considered duplicate.

	**addrfold** when used with the **addr** flag, the address
	comparison is performed in case-insensitive manner. For example,
	the addresses "John Doe <john at example.com>" and "Dr. John Doe
	<JOHN at EXAMPLE.COM>" will be considered duplicate.

	To specify multiple flags, this option can be given multiple
	times. For example, ``--filter-flag=name --filter-flag=addr``
	will print unique case-sensitive combinations of both name and
	address parts.

With this, the previously default behavior would now has to be spelled
as "--filter-flag=addr --filter-flag=addrfold".

I'm not sure it is wise present such a low-level interface (flags) to
command-line users, but it is hopefully more understandable now. What do
you think?

> Not to go to choose which one to choose (first, last, most common) instead
> of the suggested options these should be the ones:
>
> 1) "John Doe" <john.doe at example.com> and "John Doe" <JOHN.DOE at EXAMPLE.COM>:
> only one printed, but if either were "Dr. John Doe", both of these are printed
> (this as default).

According to the above, which could be achieved as --filter-flag=name
--filter-flag=addr --filter-flag=addrfold.

> 2) same as above, but only make case-insensitive

case-insensitive is already in 1), you probably mean case-sensitive.

> address match -- i.e. in the 2 above cases in option 1, print only
> one.

This would be --filter-flag=name --filter-flag=addr.

> (and same name but different address to perhaps never been an option...)
>
> I might like to have option that does case-sensitive address match, 

This would be just --filter-flag=addr.

As a side note, it is interesting, that you mentioned your options as an
enumeration even though they are actually combinations of several on/off
flags. I think that it is more natural for human brains to think in
terms of simple lists than in terms of combinations of flags. That's why
I originally implemented --output=addresses as just another keyword,
rather than requiring the user to specify both sender and receivers.

Thanks for the review.
-Michal

> In those cases I don't know the recipient's culture and the email he
> sent to me used format <Foo.Bar at example.org> (and not knowing which
> one is the first and which last name (or whatever names these are) --
> just to reply in same case format in respect...

>
>
> Tomi
>
>
>> ---
>>  completion/notmuch-completion.bash |   2 +-
>>  completion/notmuch-completion.zsh  |   3 +-
>>  doc/man1/notmuch-search.rst        |  22 +++++++-
>>  notmuch-search.c                   | 100 ++++++++++++++++++++++++++++++++++---
>>  test/T090-search-output.sh         |  64 ++++++++++++++++++++++++
>>  5 files changed, 182 insertions(+), 9 deletions(-)
>>
>> diff --git a/completion/notmuch-completion.bash b/completion/notmuch-completion.bash
>> index 0571dc9..c37ddf5 100644
>> --- a/completion/notmuch-completion.bash
>> +++ b/completion/notmuch-completion.bash
>> @@ -294,7 +294,7 @@ _notmuch_search()
>>  	    return
>>  	    ;;
>>  	--output)
>> -	    COMPREPLY=( $( compgen -W "summary threads messages files tags" -- "${cur}" ) )
>> +	    COMPREPLY=( $( compgen -W "summary threads messages files tags sender recipients addresses" -- "${cur}" ) )
>>  	    return
>>  	    ;;
>>  	--sort)
>> diff --git a/completion/notmuch-completion.zsh b/completion/notmuch-completion.zsh
>> index 67a9aba..bff8fd5 100644
>> --- a/completion/notmuch-completion.zsh
>> +++ b/completion/notmuch-completion.zsh
>> @@ -52,7 +52,8 @@ _notmuch_search()
>>    _arguments -s : \
>>      '--max-threads=[display only the first x threads from the search results]:number of threads to show: ' \
>>      '--first=[omit the first x threads from the search results]:number of threads to omit: ' \
>> -    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))'
>> +    '--sort=[sort results]:sorting:((newest-first\:"reverse chronological order" oldest-first\:"chronological order"))' \
>> +    '--output=[select what to output]:output:((summary threads messages files tags sender recipients addresses))'
>>  }
>>
>>  _notmuch()
>> diff --git a/doc/man1/notmuch-search.rst b/doc/man1/notmuch-search.rst
>> index 90160f2..3447820 100644
>> --- a/doc/man1/notmuch-search.rst
>> +++ b/doc/man1/notmuch-search.rst
>> @@ -35,7 +35,7 @@ Supported options for **search** include
>>          intended for programs that invoke **notmuch(1)** internally. If
>>          omitted, the latest supported version will be used.
>>
>> -    ``--output=(summary|threads|messages|files|tags)``
>> +    ``--output=(summary|threads|messages|files|tags|sender|recipients|addresses)``
>>
>>          **summary**
>>              Output a summary of each thread with any message matching
>> @@ -78,6 +78,26 @@ Supported options for **search** include
>>              by null characters (--format=text0), as a JSON array
>>              (--format=json), or as an S-Expression list (--format=sexp).
>>
>> +	**sender**
>> +            Output all addresses from the *From* header that appear on
>> +            any message matching the search terms, either one per line
>> +            (--format=text), separated by null characters
>> +            (--format=text0), as a JSON array (--format=json), or as
>> +            an S-Expression list (--format=sexp).
>> +
>> +	    Note: Searching for **sender** should be much faster than
>> +	    searching for **recipients** or **addresses**, because
>> +	    sender addresses are cached directly in the database
>> +	    whereas other addresses need to be fetched from message
>> +	    files.
>> +
>> +	**recipients**
>> +            Like **sender** but for addresses from *To*, *Cc* and
>> +	    *Bcc* headers.
>> +
>> +	**addresses**
>> +	    Like **sender** and **recipients** together.
>> +
>>      ``--sort=``\ (**newest-first**\ \|\ **oldest-first**)
>>          This option can be used to present results in either
>>          chronological order (**oldest-first**) or reverse chronological
>> diff --git a/notmuch-search.c b/notmuch-search.c
>> index 5ac2a26..0614f10 100644
>> --- a/notmuch-search.c
>> +++ b/notmuch-search.c
>> @@ -23,11 +23,14 @@
>>  #include "string-util.h"
>>
>>  typedef enum {
>> -    OUTPUT_SUMMARY,
>> -    OUTPUT_THREADS,
>> -    OUTPUT_MESSAGES,
>> -    OUTPUT_FILES,
>> -    OUTPUT_TAGS
>> +    OUTPUT_SUMMARY	= 1 << 0,
>> +    OUTPUT_THREADS	= 1 << 1,
>> +    OUTPUT_MESSAGES	= 1 << 2,
>> +    OUTPUT_FILES	= 1 << 3,
>> +    OUTPUT_TAGS		= 1 << 4,
>> +    OUTPUT_SENDER	= 1 << 5,
>> +    OUTPUT_RECIPIENTS	= 1 << 6,
>> +    OUTPUT_ADDRESSES	= OUTPUT_SENDER | OUTPUT_RECIPIENTS,
>>  } output_t;
>>
>>  typedef struct {
>> @@ -220,6 +223,67 @@ do_search_threads (search_options_t *o)
>>      return 0;
>>  }
>>
>> +static void
>> +print_address_list (const search_options_t *o, InternetAddressList *list)
>> +{
>> +    InternetAddress *address;
>> +    int i;
>> +
>> +    for (i = 0; i < internet_address_list_length (list); i++) {
>> +	address = internet_address_list_get_address (list, i);
>> +	if (INTERNET_ADDRESS_IS_GROUP (address)) {
>> +	    InternetAddressGroup *group;
>> +	    InternetAddressList *group_list;
>> +
>> +	    group = INTERNET_ADDRESS_GROUP (address);
>> +	    group_list = internet_address_group_get_members (group);
>> +	    if (group_list == NULL)
>> +		continue;
>> +
>> +	    print_address_list (o, group_list);
>> +	} else {
>> +	    InternetAddressMailbox *mailbox;
>> +	    const char *name;
>> +	    const char *addr;
>> +	    char *full_address;
>> +
>> +	    mailbox = INTERNET_ADDRESS_MAILBOX (address);
>> +
>> +	    name = internet_address_get_name (address);
>> +	    addr = internet_address_mailbox_get_addr (mailbox);
>> +
>> +	    if (name && *name)
>> +		full_address = talloc_asprintf (o->format, "%s <%s>", name, addr);
>> +	    else
>> +		full_address = talloc_strdup (o->format, addr);
>> +
>> +	    if (!full_address) {
>> +		fprintf (stderr, "Error: out of memory\n");
>> +		break;
>> +	    }
>> +	    o->format->string (o->format, full_address);
>> +	    o->format->separator (o->format);
>> +
>> +	    talloc_free (full_address);
>> +	}
>> +    }
>> +}
>> +
>> +static void
>> +print_address_string (const search_options_t *o, const char *recipients)
>> +{
>> +    InternetAddressList *list;
>> +
>> +    if (recipients == NULL)
>> +	return;
>> +
>> +    list = internet_address_list_parse_string (recipients);
>> +    if (list == NULL)
>> +	return;
>> +
>> +    print_address_list (o, list);
>> +}
>> +
>>  static int
>>  do_search_messages (search_options_t *o)
>>  {
>> @@ -266,11 +330,29 @@ do_search_messages (search_options_t *o)
>>
>>  	    notmuch_filenames_destroy( filenames );
>>
>> -	} else { /* output == OUTPUT_MESSAGES */
>> +	} else if (o->output == OUTPUT_MESSAGES) {
>>  	    format->set_prefix (format, "id");
>>  	    format->string (format,
>>  			    notmuch_message_get_message_id (message));
>>  	    format->separator (format);
>> +	} else {
>> +	    if (o->output & OUTPUT_SENDER) {
>> +		const char *addrs;
>> +
>> +		addrs = notmuch_message_get_header (message, "from");
>> +		print_address_string (o, addrs);
>> +	    }
>> +
>> +	    if (o->output & OUTPUT_RECIPIENTS) {
>> +		const char *hdrs[] = { "to", "cc", "bcc" };
>> +		const char *addrs;
>> +		size_t j;
>> +
>> +		for (j = 0; j < ARRAY_SIZE (hdrs); j++) {
>> +		    addrs = notmuch_message_get_header (message, hdrs[j]);
>> +		    print_address_string (o, addrs);
>> +		}
>> +	    }
>>  	}
>>
>>  	notmuch_message_destroy (message);
>> @@ -370,6 +452,9 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>>  	  (notmuch_keyword_t []){ { "summary", OUTPUT_SUMMARY },
>>  				  { "threads", OUTPUT_THREADS },
>>  				  { "messages", OUTPUT_MESSAGES },
>> +				  { "sender", OUTPUT_SENDER },
>> +				  { "recipients", OUTPUT_RECIPIENTS },
>> +				  { "addresses", OUTPUT_ADDRESSES },
>>  				  { "files", OUTPUT_FILES },
>>  				  { "tags", OUTPUT_TAGS },
>>  				  { 0, 0 } } },
>> @@ -461,6 +546,9 @@ notmuch_search_command (notmuch_config_t *config, int argc, char *argv[])
>>  	ret = do_search_threads (&o);
>>  	break;
>>      case OUTPUT_MESSAGES:
>> +    case OUTPUT_SENDER:
>> +    case OUTPUT_RECIPIENTS:
>> +    case OUTPUT_ADDRESSES:
>>      case OUTPUT_FILES:
>>  	ret = do_search_messages (&o);
>>  	break;
>> diff --git a/test/T090-search-output.sh b/test/T090-search-output.sh
>> index 947d572..5458de1 100755
>> --- a/test/T090-search-output.sh
>> +++ b/test/T090-search-output.sh
>> @@ -387,6 +387,70 @@ cat <<EOF >EXPECTED
>>  EOF
>>  test_expect_equal_file OUTPUT EXPECTED
>>
>> +test_begin_subtest "--output=sender"
>> +notmuch search --output=sender '*' | sort | uniq --count >OUTPUT
>> +cat <<EOF >EXPECTED
>> +      1 Adrian Perez de Castro <aperez at igalia.com>
>> +      2 Alex Botero-Lowry <alex.boterolowry at gmail.com>
>> +      4 Alexander Botero-Lowry <alex.boterolowry at gmail.com>
>> +      1 Aron Griffis <agriffis at n01se.net>
>> +     12 Carl Worth <cworth at cworth.org>
>> +      1 Chris Wilson <chris at chris-wilson.co.uk>
>> +      1 François Boulogne <boulogne.f at gmail.com>
>> +      1 Ingmar Vanhassel <ingmar at exherbo.org>
>> +      1 Israel Herraiz <isra at herraiz.org>
>> +      4 Jan Janak <jan at ryngle.com>
>> +      2 Jjgod Jiang <gzjjgod at gmail.com>
>> +      7 Keith Packard <keithp at keithp.com>
>> +      5 Lars Kellogg-Stedman <lars at seas.harvard.edu>
>> +      5 Mikhail Gusarov <dottedmag at dottedmag.net>
>> +      1 Olivier Berger <olivier.berger at it-sudparis.eu>
>> +      1 Rolland Santimano <rollandsantimano at yahoo.com>
>> +      3 Stewart Smith <stewart at flamingspork.com>
>> +EOF
>> +test_expect_equal_file OUTPUT EXPECTED
>> +
>> +test_begin_subtest "--output=recipients"
>> +notmuch search --output=recipients '*' | sort | uniq --count >OUTPUT
>> +cat <<EOF >EXPECTED
>> +      1 Allan McRae <allan at archlinux.org>
>> +      1 Discussion about the Arch User Repository (AUR) <aur-general at archlinux.org>
>> +      1 Keith Packard <keithp at keithp.com>
>> +      1 Mikhail Gusarov <dottedmag at dottedmag.net>
>> +      2 notmuch <notmuch at notmuchmail.org>
>> +     48 notmuch at notmuchmail.org
>> +      1 olivier.berger at it-sudparis.eu
>> +EOF
>> +test_expect_equal_file OUTPUT EXPECTED
>> +
>> +test_begin_subtest "--output=addresses"
>> +notmuch search --output=addresses '*' | sort | uniq --count >OUTPUT
>> +cat <<EOF >EXPECTED
>> +      1 Adrian Perez de Castro <aperez at igalia.com>
>> +      2 Alex Botero-Lowry <alex.boterolowry at gmail.com>
>> +      4 Alexander Botero-Lowry <alex.boterolowry at gmail.com>
>> +      1 Allan McRae <allan at archlinux.org>
>> +      1 Aron Griffis <agriffis at n01se.net>
>> +     12 Carl Worth <cworth at cworth.org>
>> +      1 Chris Wilson <chris at chris-wilson.co.uk>
>> +      1 Discussion about the Arch User Repository (AUR) <aur-general at archlinux.org>
>> +      1 François Boulogne <boulogne.f at gmail.com>
>> +      1 Ingmar Vanhassel <ingmar at exherbo.org>
>> +      1 Israel Herraiz <isra at herraiz.org>
>> +      4 Jan Janak <jan at ryngle.com>
>> +      2 Jjgod Jiang <gzjjgod at gmail.com>
>> +      8 Keith Packard <keithp at keithp.com>
>> +      5 Lars Kellogg-Stedman <lars at seas.harvard.edu>
>> +      6 Mikhail Gusarov <dottedmag at dottedmag.net>
>> +      1 Olivier Berger <olivier.berger at it-sudparis.eu>
>> +      1 Rolland Santimano <rollandsantimano at yahoo.com>
>> +      3 Stewart Smith <stewart at flamingspork.com>
>> +      2 notmuch <notmuch at notmuchmail.org>
>> +     48 notmuch at notmuchmail.org
>> +      1 olivier.berger at it-sudparis.eu
>> +EOF
>> +test_expect_equal_file OUTPUT EXPECTED
>> +
>>  test_begin_subtest "sanitize output for quoted-printable line-breaks in author and subject"
>>  add_message "[subject]='two =?ISO-8859-1?Q?line=0A_subject?=
>>  	headers'"
>> --
>> 2.1.1
>>
>> _______________________________________________
>> notmuch mailing list
>> notmuch at notmuchmail.org
>> http://notmuchmail.org/mailman/listinfo/notmuch