[PATCH v2 01/11] lib: message: index message file sizes

David Bremner david at tethera.net
Thu Jun 8 04:39:16 PDT 2017


As a preliminary note, I think this series will most likely need to
adapt to the reindexing series
id:20170604123235.24466-2-david at tethera.net as I think they are touching
the same parts of the code.  You might want to wait for that to go in
(or for it to be cancelled) before reworking your series.

Ioan-Adrian Ratiu <adi at adirat.com> writes:

> Parse & store the file sizes inside notmuch_message_t objects
> while indexing.

That seems not actually to be true, since there is no member of
notmuch_message_t which stores the filesize. It's also a bit confusing,
since indexing is about updating the database, not the in-memory data
structures.

> +    filesize = _notmuch_message_file_get_size (message_file);
> +    filesize_str = talloc_asprintf(NULL, "%lu", filesize);
> +    if (! filesize_str)
> +	return NOTMUCH_STATUS_OUT_OF_MEMORY;
> +
> +    _notmuch_message_add_term (message, "filesize", filesize_str);
> +    talloc_free (filesize_str);
> +

As I mentioned in a previous message,
   1) this crashes, because you have no prefix for filesize yet.
   2) there seems to be no point in adding this term, since you search
   on the value slot anyway.
   Presumably you want to replace it with a call to _notmuch_message_add_filesize.

I did manage to do a little benchmarking after applying the next patch,
and database size and initial indexing time both increase by about
0.5% with the notmuch performance test suite (large version). This seems
acceptable to me, and I would hope it only improves (or at least doesn't
get worse) when the redundant terms are dropped.

> +    /* filesize defaults to zero which is ignored */

Which filesize do you refer to here? I'm a bit on the fence about
pervasively assuming a zero filesize is an error.

> +    ret = g_stat(message->filename, &statResult);
> +    if (! ret)
> +	message->filesize = statResult.st_size;
> +

Why are you using g_stat instead of plain stat? g_stat seems to mainly
add windows compatibility (and confusion, since it's less familiar).

> +unsigned long
> +notmuch_message_get_filesize (notmuch_message_t *message)
> +{
> +    std::string value;
> +
> +    try {
> +	value = message->doc.get_value (NOTMUCH_VALUE_FILESIZE);

I wondered if this was wasteful going straight to the database without
caching, but apparently we do it already for from, subject, and
message-id.

> +    } catch (Xapian::Error &error) {
> +	_notmuch_database_log(_notmuch_message_database (message), "A Xapian exception occurred when reading filesize: %s\n",
> +		 error.get_msg().c_str());
> +	message->notmuch->exception_reported = TRUE;
> +	return 0;
> +    }
> +    if (value.empty ())
> +	/* sortable_unserialise is undefined on empty string */
> +	return 0;
> +    return Xapian::sortable_unserialise (value);
> +}

I'm not sure about this error handling. Do we want an API where we can't
tell the difference between a missing value, an empty file, and a
transient Xapian exception? OTOH, I do see that it's a bit clunky to use
a status return and output pointer here.
>  
> +void
> +_notmuch_message_add_filesize (notmuch_message_t *message,
> +			       notmuch_message_file_t *message_file)
> +{
> +    unsigned long filesize = _notmuch_message_file_get_size(message_file);
> +    message->doc.add_value (NOTMUCH_VALUE_FILESIZE,
> +			    Xapian::sortable_serialise (filesize));
> +}

Shouldn't this have some exception handling (and probably an error
return)? basically any xapian operation can throw an exception.

>  /**
> + * Get the filesize in bytes of 'message'.
> + */
> +unsigned long
> +notmuch_message_get_filesize  (notmuch_message_t *message);
> +
> +/**

Please document the error conditions and returns of any public API call added.


More information about the notmuch mailing list