[PATCH 0/5] Fetch all message metadata in a single pass
Austin Clements
amdragon at mit.edu
Sun Mar 20 23:56:05 PDT 2011
Thanks for the thorough review. My updated patches are on the
eager-metadata-v4 branch (also, for-review/eager-metadata-v4) at
http://awakening.csail.mit.edu/git/notmuch.git
Responses inline.
On Thu, Mar 10, 2011 at 10:48 PM, Carl Worth <cworth at cworth.org> wrote:
> On Sun, 13 Feb 2011 15:25:48 -0500, Austin Clements <amdragon at mit.edu> wrote:
>> I've rebased this against current master and fixed a few merge
>> conflicts. The rebased patches are on the eager-metadata-v3 branch of
>> http://awakening.csail.mit.edu/git/notmuch.git
>
> Hi Austin,
>
> This looks like a great set of optimizations and cleanup. Here is some
> (long-overdue) review.
>
> First, the patch series looks excellent and my review here is quite
> nit-picky. I feel bad reviewing this so late and not just immediately
> merging it, but I will commit to picking up a refreshed series very
> quickly.
>
> One very minor thing is that the word "metadata" might be confused with
> Xapian metadata which we are using already such as:
>
> version_string = notmuch->xapian_db->get_metadata ("version");
>
> So we might want to come up with a better name here. I don't have any
> concrete suggestion yet, so if you don't think of anything obvious, then
> don't worry about it.
Hmm. I think this is okay because I always refer to "message
metadata" (and I couldn't think of a better term).
>> + /* Get tags */
>> + if (!message->tag_list) {
>> + message->tag_list =
>> + _notmuch_get_terms_with_prefix (message, i, end, tag_prefix);
>> + _notmuch_string_list_sort (message->tag_list);
>> + }
>
> Looks like the above case is missing the assert to ensure proper prefix
> ordering.
Fixed.
>> + if (!message->tag_list)
>> + _notmuch_message_ensure_metadata (message);
>> +
>> + /* We tell the iterator to add a talloc reference to the
>> + * underlying list, rather than stealing it. As a result, it's
>> + * possible to modify the message tags while iterating because
>> + * the iterator will keep the current list alive. */
>> + return _notmuch_tags_create (message, message->tag_list, FALSE);
>
> The behavior here is great, but don't like Boolean function parameters
> being used to change the semantics. The problem with a Boolean argument
> is that it's impossible to know the semantic by just reading the calling
> code---you must also consult the implementation as well.
>
> For any case where it seems natural to implement something with a
> Boolean argument, I sometimes use an approach something like this:
>
> static void
> _foo_function_internal (foo_t *, ..., bool_t be_different)
> {
> ...;
> }
>
> void
> foo_function (foo_t *, ...)
> {
> return _foo_function_internal (foo, ..., FALSE);
> }
>
> void
> foo_function_different (foo_t *, ...)
> {
> return _foo_function_internal (foo, ..., TRUE);
> }
>
> That is, I'm willing to accept the Boolean parameter in the case of a
> static function defined immediately next to all callers. Here, for any
> non-static callers the semantics should be quite clear. (And perhaps the
> function calling with the FALSE case will need a clarifying suffix as
> well---one might have some_function_foo and some_function_bar for the
> two cases).
>
> Of course, my proscription against Boolean parameter doesn't apply to
> something like a function that is setting some Boolean feature to TRUE
> or FALSE. The important criterion is that the function semantics should
> be evident to someone reading code calling the function. So if the
> Boolean argument is obviously tied to some portion of the function name,
> then that can be adequate.
>
> Enough with the generalities, back to _notmuch_tags_create...
>
> The caller above is the exceptional caller. It's the only one using
> passing FALSE, and it also already has a large comment. So it seems that
> the right fix here is to have the extra talloc_reference happen here
> where there's a comment talking about adding a talloc reference.
>
> So it would be great to see something here like:
>
> tags = _notmuch_tags_create (message, message->tag_list);
> talloc_reference (message, message->tag_list);
> return tags;
>
> Of course, that would mean that _notmuch_tags_create can't have done the
> talloc_steal. So perhaps all the other callers should be calling:
>
> _notmuch_tags_create_with_steal (ctx, tag_list);
>
> What do you think?
Your point about the boolean argument is well taken. In this case
there's really no need for two difference versions, so I wound up
making _notmuch_tags_create always steal the reference.
I debated for a while whether it should add a reference, thus forcing
most callers to talloc_unlink, or steal the reference, thus forcing
one caller to talloc_reference. I ultimately decided it was much more
likely that the caller would forget the talloc_unlink, resulting in a
difficult-to-track memory leak (since the list would *eventually* be
cleaned up), than that the steal would cause confusion. Also, there
is some precedence for internal functions that steal an argument. So
I made it steal the reference.
This doesn't cause any problems in the one weird case that still needs
the reference to the list. After the _notmuch_tags_create, the caller
simply adds that reference.
>> -notmuch_tags_t *
>> -_notmuch_convert_tags (void *ctx, Xapian::TermIterator &i,
>> - Xapian::TermIterator &end);
>> +notmuch_string_list_t *
>> +_notmuch_get_terms_with_prefix (void *ctx, Xapian::TermIterator &i,
>> + Xapian::TermIterator &end,
>> + const char *prefix);
>
> I don't really like the fact that _notmuch_get_terms_with_prefix doesn't
> make a clear indication of what file it's defined in, (the old function
> _notmuch_convert_tags had the same problem). It could be named
> _notmuch_database_convert_tags since it's in database.cc, but that would
> be odd in not actually accepting a notmuch_database_t * as the first
> parameter. Any suggestion here?
_notmuch_get_terms_with_prefix is weird because it captures a pattern
that lies squarely in the Xapian world, being all about term
iterators. Hence, I think the "right" solution (note, not the best
solution), would be to hide the term iterators and make two copies of
the function: one that takes a notmuch_database_t and always considers
all database terms, and one private to message.cc that acts as a
helper to what's now all bundled in _notmuch_message_ensure_metadata.
But that bothers me more than a function that doesn't take a
notmuch_database_t *. So I just added "database" to the name.
>> index 0000000..d92a0ba
>> --- /dev/null
>> +++ b/lib/strings.c
>> @@ -0,0 +1,94 @@
>> +/* strings.c - Iterator for a list of strings
>
> Similarly, this file might be better as string-list.c.
Done.
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program. If not, see http://www.gnu.org/licenses/ .
>> + *
>> + * Author: Carl Worth <cworth at cworth.org>
>
> Hey, wait a second, some of this code is mine, but I think the sort
> function is yours. Please do start annotating the Author tags on all
> files as appropriate. (There are probably lots of files missing your
> Author attribution---I just happened to notice this one here since you
> happened to have an Author line in the patch.)
Heh, fixed. I suppose I hadn't been thinking about it, since none of
the source in lib/ has other authors listed.
But it raises an interesting question. When is it kosher to add
oneself to a file's author list? In this case I wrote about half of
the code in that admittedly short file, so that makes sense Changing
a few lines presumably isn't enough. Adding a few functions?
>> -/* XXX: Should write some talloc-friendly list to avoid the need for
>> - * this. */
>
> Hurrah! I love patches that get to remove XXX comments.
]:--8)
>> + /* Get thread */
>> + if (!message->thread_id)
>> + message->thread_id =
>> + _notmuch_message_get_term (message, i, end, thread_prefix);
>> +
>> + /* Get id */
>> + assert (strcmp (thread_prefix, id_prefix) < 0);
>> + if (!message->message_id)
>> + message->message_id =
>> + _notmuch_message_get_term (message, i, end, id_prefix);
>
> Missing asserts on the above two as well?
Fixed. (Actually, I think there was only one assert missing in total,
but it had propagated through the history.)
> -Carl
More information about the notmuch
mailing list