[PATCH 0/5] Fetch all message metadata in a single pass

Austin Clements amdragon at mit.edu
Sun Mar 20 23:56:05 PDT 2011


Thanks for the thorough review.  My updated patches are on the
eager-metadata-v4 branch (also, for-review/eager-metadata-v4) at
  http://awakening.csail.mit.edu/git/notmuch.git
Responses inline.

On Thu, Mar 10, 2011 at 10:48 PM, Carl Worth <cworth at cworth.org> wrote:
> On Sun, 13 Feb 2011 15:25:48 -0500, Austin Clements <amdragon at mit.edu> wrote:
>> I've rebased this against current master and fixed a few merge
>> conflicts.  The rebased patches are on the eager-metadata-v3 branch of
>>   http://awakening.csail.mit.edu/git/notmuch.git
>
> Hi Austin,
>
> This looks like a great set of optimizations and cleanup. Here is some
> (long-overdue) review.
>
> First, the patch series looks excellent and my review here is quite
> nit-picky. I feel bad reviewing this so late and not just immediately
> merging it, but I will commit to picking up a refreshed series very
> quickly.
>
> One very minor thing is that the word "metadata" might be confused with
> Xapian metadata which we are using already such as:
>
>    version_string = notmuch->xapian_db->get_metadata ("version");
>
> So we might want to come up with a better name here. I don't have any
> concrete suggestion yet, so if you don't think of anything obvious, then
> don't worry about it.

Hmm.  I think this is okay because I always refer to "message
metadata" (and I couldn't think of a better term).

>> +    /* Get tags */
>> +    if (!message->tag_list) {
>> +     message->tag_list =
>> +         _notmuch_get_terms_with_prefix (message, i, end, tag_prefix);
>> +     _notmuch_string_list_sort (message->tag_list);
>> +    }
>
> Looks like the above case is missing the assert to ensure proper prefix
> ordering.

Fixed.

>> +    if (!message->tag_list)
>> +     _notmuch_message_ensure_metadata (message);
>> +
>> +    /* We tell the iterator to add a talloc reference to the
>> +     * underlying list, rather than stealing it.  As a result, it's
>> +     * possible to modify the message tags while iterating because
>> +     * the iterator will keep the current list alive. */
>> +    return _notmuch_tags_create (message, message->tag_list, FALSE);
>
> The behavior here is great, but don't like Boolean function parameters
> being used to change the semantics. The problem with a Boolean argument
> is that it's impossible to know the semantic by just reading the calling
> code---you must also consult the implementation as well.
>
> For any case where it seems natural to implement something with a
> Boolean argument, I sometimes use an approach something like this:
>
>        static void
>        _foo_function_internal (foo_t *, ..., bool_t be_different)
>        {
>                ...;
>        }
>
>        void
>        foo_function (foo_t *, ...)
>        {
>                return _foo_function_internal (foo, ..., FALSE);
>        }
>
>        void
>        foo_function_different (foo_t *, ...)
>        {
>                return _foo_function_internal (foo, ..., TRUE);
>        }
>
> That is, I'm willing to accept the Boolean parameter in the case of a
> static function defined immediately next to all callers. Here, for any
> non-static callers the semantics should be quite clear. (And perhaps the
> function calling with the FALSE case will need a clarifying suffix as
> well---one might have some_function_foo and some_function_bar for the
> two cases).
>
> Of course, my proscription against Boolean parameter doesn't apply to
> something like a function that is setting some Boolean feature to TRUE
> or FALSE. The important criterion is that the function semantics should
> be evident to someone reading code calling the function. So if the
> Boolean argument is obviously tied to some portion of the function name,
> then that can be adequate.
>
> Enough with the generalities, back to _notmuch_tags_create...
>
> The caller above is the exceptional caller. It's the only one using
> passing FALSE, and it also already has a large comment. So it seems that
> the right fix here is to have the extra talloc_reference happen here
> where there's a comment talking about adding a talloc reference.
>
> So it would be great to see something here like:
>
>        tags = _notmuch_tags_create (message, message->tag_list);
>        talloc_reference (message, message->tag_list);
>        return tags;
>
> Of course, that would mean that _notmuch_tags_create can't have done the
> talloc_steal. So perhaps all the other callers should be calling:
>
>        _notmuch_tags_create_with_steal (ctx, tag_list);
>
> What do you think?

Your point about the boolean argument is well taken.  In this case
there's really no need for two difference versions, so I wound up
making _notmuch_tags_create always steal the reference.

I debated for a while whether it should add a reference, thus forcing
most callers to talloc_unlink, or steal the reference, thus forcing
one caller to talloc_reference.  I ultimately decided it was much more
likely that the caller would forget the talloc_unlink, resulting in a
difficult-to-track memory leak (since the list would *eventually* be
cleaned up), than that the steal would cause confusion.  Also, there
is some precedence for internal functions that steal an argument.  So
I made it steal the reference.

This doesn't cause any problems in the one weird case that still needs
the reference to the list.  After the _notmuch_tags_create, the caller
simply adds that reference.

>> -notmuch_tags_t *
>> -_notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
>> -                    Xapian::TermIterator &end);
>> +notmuch_string_list_t *
>> +_notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i,
>> +                             Xapian::TermIterator &end,
>> +                             const char *prefix);
>
> I don't really like the fact that _notmuch_get_terms_with_prefix doesn't
> make a clear indication of what file it's defined in, (the old function
> _notmuch_convert_tags had the same problem). It could be named
> _notmuch_database_convert_tags since it's in database.cc, but that would
> be odd in not actually accepting a notmuch_database_t * as the first
> parameter. Any suggestion here?

_notmuch_get_terms_with_prefix is weird because it captures a pattern
that lies squarely in the Xapian world, being all about term
iterators.  Hence, I think the "right" solution (note, not the best
solution), would be to hide the term iterators and make two copies of
the function: one that takes a notmuch_database_t and always considers
all database terms, and one private to message.cc that acts as a
helper to what's now all bundled in _notmuch_message_ensure_metadata.

But that bothers me more than a function that doesn't take a
notmuch_database_t *.  So I just added "database" to the name.

>> index 0000000..d92a0ba
>> --- /dev/null
>> +++ b/lib/strings.c
>> @@ -0,0 +1,94 @@
>> +/* strings.c - Iterator for a list of strings
>
> Similarly, this file might be better as string-list.c.

Done.

>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see http://www.gnu.org/licenses/ .
>> + *
>> + * Author: Carl Worth <cworth at cworth.org>
>
> Hey, wait a second, some of this code is mine, but I think the sort
> function is yours. Please do start annotating the Author tags on all
> files as appropriate. (There are probably lots of files missing your
> Author attribution---I just happened to notice this one here since you
> happened to have an Author line in the patch.)

Heh, fixed.  I suppose I hadn't been thinking about it, since none of
the source in lib/ has other authors listed.

But it raises an interesting question.  When is it kosher to add
oneself to a file's author list?  In this case I wrote about half of
the code in that admittedly short file, so that makes sense  Changing
a few lines presumably isn't enough.  Adding a few functions?

>> -/* XXX: Should write some talloc-friendly list to avoid the need for
>> - * this. */
>
> Hurrah! I love patches that get to remove XXX comments.

]:--8)

>> +    /* Get thread */
>> +    if (!message->thread_id)
>> +     message->thread_id =
>> +         _notmuch_message_get_term (message, i, end, thread_prefix);
>> +
>> +    /* Get id */
>> +    assert (strcmp (thread_prefix, id_prefix) < 0);
>> +    if (!message->message_id)
>> +     message->message_id =
>> +         _notmuch_message_get_term (message, i, end, id_prefix);
>
> Missing asserts on the above two as well?

Fixed.  (Actually, I think there was only one assert missing in total,
but it had propagated through the history.)

> -Carl


More information about the notmuch mailing list