find threads where I and Jian participated but not Dave

Matt Armstrong marmstrong at google.com
Thu Jun 22 17:00:58 PDT 2017


Gaute Hope <eg at gaute.vetsj.com> writes:

> Gaute Hope writes on juni 22, 2017 8:08:
>> Daniel Kahn Gillmor writes on juni 21, 2017 23:30:
>>> On Wed 2017-06-21 13:04:53 -0700, Matt Armstrong wrote:
>>>> For what it is worth, I've found this idea from Daniel intriguing and
>>>> pretty useful in practice:
>>>>
>>>>   "show me threads in which i've participated, where there are some
>>>>    messages flagged with 'inbox'"
>>>>
>>>> I implement it like this in my post-new hook:
>>>>
>>>>     # All messages in threads in which I participate get tag:participated
>>>>     notmuch search --output=threads from:marmstrong | \
>>>>       sed -e 's,^,+participated -- ,' | \
>>>>       notmuch tag --batch
>>> 
>>> cool, thx for the suggestion.
>>> 
>>> the "notmuch search" part of the pipeline alone takes ~19s (wall time,
>>> and actual CPU time) for me though :/  It returns 30504 threads!  how
>>> many threads do you get?
>> 
>> Is there any reason why you do not filter on a tag 'new' as well?
>> 
>>      notmuch search --output=threads from:marmstrong and tag:new | \
>>        sed -e 's,^,+participated -- ,' | \
>>        notmuch tag --batch
>> 
>
> Nevermind, I get it - it might be possible to add a temporary tag 
> new-tag to the whole thread first and not just new messages. That might 
> be faster. As long as all sent messages get the new tag as well.

Gaute, I took this as a challenge and came up with what I think is an
equivalent but more efficient approach.  The disadvantage is that it is
much more complex.  The advantage is that it runs in under 0.2 seconds
to process a day's worth of my "new" mail.

I now have this in my notmuch post-hook.  I believe I could change the
"tag:new OR date:today" query to just "tag:new".  The "OR date:today"
helped during interactive development.

# All threads in which I participate get tag:participated
#  1) Find all threads with a message tagged new
#     (finding all 'today' messages helps during testing,
#     but isn't necessary)
#  2) Run through "xargs -s 2048 echo" to to group threads
#     lines of about 2K in size.
#  3) For each line (2) produces, narrow the threads to
#     those containing a message from me.
#  4) For each such thread, tag every message with +participated.
notmuch search --output=threads tag:new OR date:today | \
  xargs -s 2048 echo | \
  xargs -I '{}' notmuch search \
  --output=threads from:marmstrong AND \( '{}' \) | \
  sed -e 's,^,+participated -- ,' | \
  notmuch tag --batch


The basic idea is that each run of the notmuch post-hook will
incorporate relatively little mail, so the number of unique threads will
be relatively small.  So, we just list them all by thread ID.

Then for each thread with new messages, we figure out which threads have
a message from:marmstrong (it need not be the new message).

We then tag all messages in each of those threads with +participated.

You said "it might be possible to add a temporary tag new-tag to the
whole thread first and not just new messages." -- Yes, and that is
implicitly what I am doing, except that each such thread is instead
tracked in an ephemeral way through the xargs based shell pipeline.

I did try an approach of explicitly labeling all messages in "new"
threads, temporarily, but that was slower.

You said "As long as all sent messages get the new tag as well." --
true, and I'm not sure about that.  My primary use for this is to
discover new activity from others *after* I've participated in a thread,
so I don't much care if a thread that is "participated in" is not tagged
that way until some mail from somebody else arrives.


More information about the notmuch mailing list