[PATCH 1/3] database: Add notmuch_database_compact_close
Ben Gamari
bgamari.foss at gmail.com
Tue Sep 3 07:32:20 PDT 2013
Jani Nikula <jani at nikula.org> writes:
> On Sat, 24 Aug 2013, Ben Gamari <bgamari.foss at gmail.com> wrote:
>> This function uses Xapian's Compactor machinery to compact the notmuch
>> database. The compacted database is built in a temporary directory and
>> later moved into place while the original uncompacted database is
>> preserved.
>>
snip
>>
>> +class NotmuchCompactor : public Xapian::Compactor
>> +{
>> +public:
>> + virtual void
>> + set_status (const std::string &table, const std::string &status)
>> + {
>> + if (status.length() == 0)
>> + printf ("compacting table %s:\n", table.c_str());
>> + else
>> + printf (" %s\n", status.c_str());
>> + }
>
> We're trying to reduce the amount of prints directly from libnotmuch,
> not increase. This applies here as well as below.
>
Fair enough. That being said, I think that status updates are fairly
important given that the compaction process can be rather long. Would
the preferred interface be to provide notmuch_database_compact_close
with a progress callback?
>> +};
>> +
>> +#if HAVE_XAPIAN_COMPACT
>> +notmuch_status_t
>> +notmuch_database_compact_close (notmuch_database_t *notmuch)
>> +{
>> + void *local = talloc_new (NULL);
>> + NotmuchCompactor compactor;
>> + char *notmuch_path, *xapian_path, *compact_xapian_path, *old_xapian_path;
>> + notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS;
>> +
>> + if (! (notmuch_path = talloc_asprintf (local, "%s/%s", notmuch->path, ".notmuch"))) {
>> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> + goto DONE;
>> + }
>> +
>> + if (! (xapian_path = talloc_asprintf (local, "%s/%s", notmuch_path, "xapian"))) {
>> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> + goto DONE;
>> + }
>> +
>> + if (! (compact_xapian_path = talloc_asprintf (local, "%s.compact", xapian_path))) {
>> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> + goto DONE;
>> + }
>> +
>> + if (! (old_xapian_path = talloc_asprintf (local, "%s.old", xapian_path))) {
>> + ret = NOTMUCH_STATUS_OUT_OF_MEMORY;
>> + goto DONE;
>> + }
>> +
>> + try {
>> + compactor.set_renumber(false);
>> + compactor.add_source(xapian_path);
>> + compactor.set_destdir(compact_xapian_path);
>> + compactor.compact();
>> +
>> + if (rename(xapian_path, old_xapian_path)) {
>> + fprintf (stderr, "Error moving old database out of the way\n");
>> + ret = NOTMUCH_STATUS_FILE_ERROR;
>> + goto DONE;
>> + }
>
> This fails if old_xapian_path exists.
>
Ouch, yes, you are right. I suspect the right way forward here will be
to check whether old_xapian_path exists before beginning compaction,
allowing the user to fix the situation before it fails after finishing
what might be a pretty long process.
>> +
>> + if (rename(compact_xapian_path, xapian_path)) {
>> + fprintf (stderr, "Error moving compacted database\n");
>> + ret = NOTMUCH_STATUS_FILE_ERROR;
>> + goto DONE;
>> + }
>> + } catch (Xapian::InvalidArgumentError e) {
>> + fprintf (stderr, "Error while compacting: %s", e.get_msg().c_str());
>> + ret = NOTMUCH_STATUS_XAPIAN_EXCEPTION;
>> + goto DONE;
>> + }
>> +
>> + fprintf (stderr, "\n");
>> + fprintf (stderr, "\n");
>> + fprintf (stderr, "Old database has been moved to %s", old_xapian_path);
>> + fprintf (stderr, "\n");
>> + fprintf (stderr, "To delete run,\n");
>> + fprintf (stderr, "\n");
>> + fprintf (stderr, " rm -R %s\n", old_xapian_path);
>> + fprintf (stderr, "\n");
>> +
>> + notmuch_database_close(notmuch);
>> +
>> +DONE:
>> + talloc_free(local);
>
> The database does not get closed on errors. If that's intentional, it
> should be documented.
>
I had reasons for this but they have long fled my memory. Regardless of
what it does, this behavior should be documented. I'll take care of this.
>> + return ret;
>> +}
>> +#else
>> +notmuch_status_t
>> +notmuch_database_compact_close (unused (notmuch_database_t *notmuch))
>> +{
>> + fprintf (stderr, "notmuch was compiled against a xapian version lacking compaction support.\n");
>> + return NOTMUCH_STATUS_UNSUPPORTED_OPERATION;
>> +}
>> +#endif
>> +
>> void
>> notmuch_database_destroy (notmuch_database_t *notmuch)
>> {
>> diff --git a/lib/notmuch.h b/lib/notmuch.h
>> index 998a4ae..e9abd90 100644
>> --- a/lib/notmuch.h
>> +++ b/lib/notmuch.h
>> @@ -101,6 +101,7 @@ typedef enum _notmuch_status {
>> NOTMUCH_STATUS_TAG_TOO_LONG,
>> NOTMUCH_STATUS_UNBALANCED_FREEZE_THAW,
>> NOTMUCH_STATUS_UNBALANCED_ATOMIC,
>> + NOTMUCH_STATUS_UNSUPPORTED_OPERATION,
>>
>> NOTMUCH_STATUS_LAST_STATUS
>> } notmuch_status_t;
>> @@ -215,6 +216,20 @@ notmuch_database_open (const char *path,
>> void
>> notmuch_database_close (notmuch_database_t *database);
>>
>> +/* Close the given notmuch database and then compact it.
>
> The implementation first compacts then closes.
>
>> + * After notmuch_database_close_compact has been called, calls to
>> + * other functions on objects derived from this database may either
>> + * behave as if the database had not been closed (e.g., if the
>> + * required data has been cached) or may fail with a
>> + * NOTMUCH_STATUS_XAPIAN_EXCEPTION.
>> + *
>> + * notmuch_database_close_compact can be called multiple times. Later
>> + * calls have no effect.
>
> This is not true. The Xapian compactor does not require the database to
> be open. It will happily open the database read-only and compact the
> database again if database has been closed.
>
>> + */
>> +notmuch_status_t
>> +notmuch_database_compact_close (notmuch_database_t *notmuch);
>
> I'm afraid we really need to re-think the API.
>
It seems you are right. When writing this interface it was clear that
there would be a number of opportunities for misuse. I was hoping by
combining compact and close some of these would be eliminated but
clearly this isn't enough.
> I see that your CLI 'notmuch compact' command opens the database
> read-write, I assume to ensure there are no other writers, so that stuff
> doesn't get lost. However if you pass a read-write database that has
> been modified, I believe the changes will get lost (as Xapian opens the
> database read-only). We shouldn't let the API users shoot themselves in
> the foot so easily.
>
That is correct; the read-write database was an attempt to force the
user to exclusively lock the database they are trying to compact. It
seems that things can go quite wrong[1] when a database is modified
during compaction. There was a suggestion in that thread to add an
option to lock the database during compaction. Perhaps it might be worth
bringing this up again with Xapian upstream. I think we agree that it
would be a poor idea to merge compaction functionality without having a
mechanism for ensuring data integrity, especially since many users
invoke notmuch in a cron job.
> I think I'd go for something like:
>
> notmuch_status_t
> notmuch_database_compact (const char *path);
>
> or
>
> notmuch_status_t
> notmuch_database_compact (const char *path, const char *backup);
>
> which would internally open the database as read-write to ensure no
> modifications, compact, and close. If backup != NULL, it would save the
> old database there (same mounted file system as the database is a fine
> limitation), otherwise remove.
>
This sounds fine to me.
> Even then I'm not completely sure what Xapian WritableDatabase will do
> on close when the database has been switched underneath it. But moving
> the database after it has been closed has a race condition too.
>
Good points. Not sure what the least evil way about this is. Hopefully
Xapian's close operation really does just close file handles.
Cheers,
- Ben
[1] http://lists.xapian.org/pipermail/xapian-discuss/2011-July/008310.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20130903/522f0d4a/attachment-0001.pgp>
More information about the notmuch
mailing list