[PATCH 1/3] database: Add notmuch_database_compact_close

Ben Gamari bgamari.foss at gmail.com
Tue Sep 3 07:32:20 PDT 2013


Jani Nikula <jani at nikula.org> writes:

> On Sat, 24 Aug 2013, Ben Gamari <bgamari.foss at gmail.com> wrote:
>> This function uses Xapian's Compactor machinery to compact the notmuch
>> database. The compacted database is built in a temporary directory and
>> later moved into place while the original uncompacted database is
>> preserved.
>>

snip

>>  
>> +class NotmuchCompactor : public Xapian::Compactor
>> +{
>> +public:
>> +    virtual void
>> +    set_status (const std::string &table, const std::string &status)
>> +    {
>> +	if (status.length() == 0)
>> +	    printf ("compacting table %s:\n", table.c_str());
>> +	else
>> +	    printf ("     %s\n", status.c_str());
>> +    }
>
> We're trying to reduce the amount of prints directly from libnotmuch,
> not increase. This applies here as well as below.
>
Fair enough. That being said, I think that status updates are fairly
important given that the compaction process can be rather long.  Would
the preferred interface be to provide notmuch_database_compact_close
with a progress callback?

>> +};
>> +
>> +#if HAVE_XAPIAN_COMPACT
>> +notmuch_status_t
>> +notmuch_database_compact_close (notmuch_database_t *notmuch)
>> +{
>> +    void *local = talloc_new (NULL);
>> +    NotmuchCompactor compactor;
>> +    char *notmuch_path, *xapian_path, *compact_xapian_path, *old_xapian_path;
>> +    notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
>> +
>> +    if (! (notmuch_path = talloc_asprintf (local, "%s/%s", notmuch->path, ".notmuch"))) {
>> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> +	goto DONE;
>> +    }
>> +
>> +    if (! (xapian_path = talloc_asprintf (local, "%s/%s", notmuch_path, "xapian"))) {
>> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> +	goto DONE;
>> +    }
>> +
>> +    if (! (compact_xapian_path = talloc_asprintf (local, "%s.compact", xapian_path))) {
>> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> +	goto DONE;
>> +    }
>> +
>> +    if (! (old_xapian_path = talloc_asprintf (local, "%s.old", xapian_path))) {
>> +	ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> +	goto DONE;
>> +    }
>> +
>> +    try {
>> +	compactor.set_renumber(false);
>> +	compactor.add_source(xapian_path);
>> +	compactor.set_destdir(compact_xapian_path);
>> +	compactor.compact();
>> +
>> +	if (rename(xapian_path, old_xapian_path)) {
>> +	    fprintf (stderr, "Error moving old database out of the way\n");
>> +	    ret = NOTMUCH_STATUS_FILE_ERROR;
>> +	    goto DONE;
>> +	}
>
> This fails if old_xapian_path exists.
>
Ouch, yes, you are right. I suspect the right way forward here will be
to check whether old_xapian_path exists before beginning compaction,
allowing the user to fix the situation before it fails after finishing
what might be a pretty long process.

>> +
>> +	if (rename(compact_xapian_path, xapian_path)) {
>> +	    fprintf (stderr, "Error moving compacted database\n");
>> +	    ret = NOTMUCH_STATUS_FILE_ERROR;
>> +	    goto DONE;
>> +	}
>> +    } catch (Xapian::InvalidArgumentError e) {
>> +	fprintf (stderr, "Error while compacting: %s", e.get_msg().c_str());
>> +	ret = NOTMUCH_STATUS_XAPIAN_EXCEPTION;
>> +	goto DONE;
>> +    }
>> +
>> +    fprintf (stderr, "\n");
>> +    fprintf (stderr, "\n");
>> +    fprintf (stderr, "Old database has been moved to %s", old_xapian_path);
>> +    fprintf (stderr, "\n");
>> +    fprintf (stderr, "To delete run,\n");
>> +    fprintf (stderr, "\n");
>> +    fprintf (stderr, "    rm -R %s\n", old_xapian_path);
>> +    fprintf (stderr, "\n");
>> +
>> +    notmuch_database_close(notmuch);
>> +
>> +DONE:
>> +    talloc_free(local);
>
> The database does not get closed on errors. If that's intentional, it
> should be documented.
>
I had reasons for this but they have long fled my memory. Regardless of
what it does, this behavior should be documented. I'll take care of this.

>> +    return ret;
>> +}
>> +#else
>> +notmuch_status_t
>> +notmuch_database_compact_close (unused (notmuch_database_t *notmuch))
>> +{
>> +    fprintf (stderr, "notmuch was compiled against a xapian version lacking compaction support.\n");
>> +    return NOTMUCH_STATUS_UNSUPPORTED_OPERATION;
>> +}
>> +#endif
>> +
>>  void
>>  notmuch_database_destroy (notmuch_database_t *notmuch)
>>  {
>> diff --git a/lib/notmuch.h b/lib/notmuch.h
>> index 998a4ae..e9abd90 100644
>> --- a/lib/notmuch.h
>> +++ b/lib/notmuch.h
>> @@ -101,6 +101,7 @@ typedef enum _notmuch_status {
>>      NOTMUCH_STATUS_TAG_TOO_LONG,
>>      NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW,
>>      NOTMUCH_STATUS_UNBALANCED_ATOMIC,
>> +    NOTMUCH_STATUS_UNSUPPORTED_OPERATION,
>>  
>>      NOTMUCH_STATUS_LAST_STATUS
>>  } notmuch_status_t;
>> @@ -215,6 +216,20 @@ notmuch_database_open (const char *path,
>>  void
>>  notmuch_database_close (notmuch_database_t *database);
>>  
>> +/* Close the given notmuch database and then compact it.
>
> The implementation first compacts then closes.
>

>> + * After notmuch_database_close_compact has been called, calls to
>> + * other functions on objects derived from this database may either
>> + * behave as if the database had not been closed (e.g., if the
>> + * required data has been cached) or may fail with a
>> + * NOTMUCH_STATUS_XAPIAN_EXCEPTION.
>> + *
>> + * notmuch_database_close_compact can be called multiple times.  Later
>> + * calls have no effect.
>
> This is not true. The Xapian compactor does not require the database to
> be open. It will happily open the database read-only and compact the
> database again if database has been closed.
>
>> + */
>> +notmuch_status_t
>> +notmuch_database_compact_close (notmuch_database_t *notmuch);
>
> I'm afraid we really need to re-think the API.
>
It seems you are right. When writing this interface it was clear that
there would be a number of opportunities for misuse. I was hoping by
combining compact and close some of these would be eliminated but
clearly this isn't enough.

> I see that your CLI 'notmuch compact' command opens the database
> read-write, I assume to ensure there are no other writers, so that stuff
> doesn't get lost. However if you pass a read-write database that has
> been modified, I believe the changes will get lost (as Xapian opens the
> database read-only). We shouldn't let the API users shoot themselves in
> the foot so easily.
>
That is correct; the read-write database was an attempt to force the
user to exclusively lock the database they are trying to compact. It
seems that things can go quite wrong[1] when a database is modified
during compaction.  There was a suggestion in that thread to add an
option to lock the database during compaction. Perhaps it might be worth
bringing this up again with Xapian upstream. I think we agree that it
would be a poor idea to merge compaction functionality without having a
mechanism for ensuring data integrity, especially since many users
invoke notmuch in a cron job.

> I think I'd go for something like:
>
> notmuch_status_t
> notmuch_database_compact (const char *path);
>
> or
>
> notmuch_status_t
> notmuch_database_compact (const char *path, const char *backup);
>
> which would internally open the database as read-write to ensure no
> modifications, compact, and close. If backup != NULL, it would save the
> old database there (same mounted file system as the database is a fine
> limitation), otherwise remove.
>
This sounds fine to me.

> Even then I'm not completely sure what Xapian WritableDatabase will do
> on close when the database has been switched underneath it. But moving
> the database after it has been closed has a race condition too.
>
Good points. Not sure what the least evil way about this is. Hopefully
Xapian's close operation really does just close file handles.

Cheers,

- Ben


[1] http://lists.xapian.org/pipermail/xapian-discuss/2011-July/008310.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20130903/522f0d4a/attachment-0001.pgp>


More information about the notmuch mailing list