query on a subset of messages ?
Austin Clements
amdragon at MIT.EDU
Mon Jul 9 09:30:00 PDT 2012
Quoth Sebastien Binet on Jul 09 at 10:25 am:
>
> hi there,
>
> I was trying to reduce the I/O stress during my usual email
> fetching+tagging by writing a little program using the go bindings to
> notmuch.
>
> ie:
> db, status := notmuch.OpenDatabase(db_path,
> notmuch.DATABASE_MODE_READ_WRITE)
> query := db.CreateQuery("(tag:new AND tag:inbox)")
> msgs := query.SearchMessages()
> for _,msg := range msgs {
> tag_msg(msg, tagqueries)
> }
>
>
> where tagqueries is a subquery of the form:
> [
> {
> "Cmd": "+to-me",
> "Query": "(to:sebastien.binet at cern.ch and not tag:to-me)"
> },
> {
> "Cmd": "+sci-notmuch",
> "Query": "from:notmuch at notmuchmail.org or to:notmuch at notmuchmail.org or subject:notmuch"
> }
> ]
>
>
> the idea being that I only need to crawl through the db only once and
> then iteratively apply tags on those messages (instead of repeatedly
> running "notmuch tag ..." for each and every of those many
> 'tag-queries')
>
> I couldn't find any C-API to do such a thing using the notmuch library.
> did I overlook something ?
>
> Is it something useful to add ?
>
> -s
Have you tried a more direct translation of the multiple notmuch tag
commands into Go, where you don't worry about subsetting the queries?
Unless you're tagging a huge number of messages, the cost of notmuch
tag is almost certainly the fsync that it does when it closes the
database (which every call to notmuch tag must do). However, in Go,
you can keep the database open across all of the tagging operations
and then close and fsync it just once.
Note that there is an important optimization in notmuch tag that you
might have to replicate. It manipulates the original query to exclude
messages that already have the desired tags, so that they get skipped
very efficiently at the earliest stage possible.
More information about the notmuch
mailing list