[PATCH v3 04/13] lib: Database version 3: Introduce fine-grained "features"

Austin Clements amdragon at MIT.EDU
Sat Aug 23 17:57:40 PDT 2014


Quoth Jani Nikula on Aug 23 at  7:02 pm:
> On Fri, 01 Aug 2014, Austin Clements <amdragon at MIT.EDU> wrote:
> > Previously, our database schema was versioned by a single number.
> > Each database schema change had to occur "atomically" in Notmuch's
> > development history: before some commit, Notmuch used version N, after
> > that commit, it used version N+1.  Hence, each new schema version
> > could introduce only one change, the task of developing a schema
> > change fell on a single person, and it all had to happen and be
> > perfect in a single commit series.  This made introducing a new schema
> > version hard.  We've seen only two schema changes in the history of
> > Notmuch.
> >
> > This commit introduces database schema version 3; hopefully the last
> > schema version we'll need for a while.  With this version, we switch
> > from a single version number to "features": a set of named,
> > independent aspects of the database schema.
> >
> > Features should make backwards compatibility easier.  For many things,
> > it should be easy to support databases both with and without a
> > feature, which will allow us to make upgrades optional and will enable
> > "unstable" features that can be developed and tested over time.
> >
> > Features also make forwards compatibility easier.  The features
> > recorded in a database include "compatibility flags," which can
> > indicate to an older version of Notmuch when it must support a given
> > feature to open the database for read or for write.  This lets us
> > replace the old vague "I don't recognize this version, so something
> > might go wrong, but I promise to try my best" warnings upon opening a
> > database with an unknown version with precise errors.  If a database
> > is safe to open for read/write despite unknown features, an older
> > version will know that and issue no message at all.  If the database
> > is not safe to open for read/write because of unknown features, an
> > older version will know that, too, and can tell the user exactly which
> > required features it lacks support for.
> 
> I agree with the change overall; it might be useful to have another set
> of eyes on the patch though.
> 
> > ---
> >  lib/database-private.h |  57 ++++++++++++++-
> >  lib/database.cc        | 190 ++++++++++++++++++++++++++++++++++++++++---------
> >  2 files changed, 213 insertions(+), 34 deletions(-)
> >
> > diff --git a/lib/database-private.h b/lib/database-private.h
> > index d3e65fd..2ffab33 100644
> > --- a/lib/database-private.h
> > +++ b/lib/database-private.h
> > @@ -41,11 +41,15 @@ struct _notmuch_database {
> >  
> >      char *path;
> >  
> > -    notmuch_bool_t needs_upgrade;
> >      notmuch_database_mode_t mode;
> >      int atomic_nesting;
> >      Xapian::Database *xapian_db;
> >  
> > +    /* Bit mask of features used by this database.  Features are
> > +     * named, independent aspects of the database schema.  This is a
> > +     * bitwise-OR of NOTMUCH_FEATURE_* values (below). */
> > +    unsigned int features;
> > +
> >      unsigned int last_doc_id;
> >      uint64_t last_thread_id;
> >  
> > @@ -55,6 +59,57 @@ struct _notmuch_database {
> >      Xapian::ValueRangeProcessor *date_range_processor;
> >  };
> >  
> > +/* Bit masks for _notmuch_database::features. */
> > +enum {
> > +    /* If set, file names are stored in "file-direntry" terms.  If
> > +     * unset, file names are stored in document data.
> > +     *
> > +     * Introduced: version 1.  Implementation support: both for read;
> > +     * required for write. */
> 
> Maybe I'm dense, but the implementation support comments could be
> clearer.

No, this was too terse.  The intent was to state what the current
library implementation supports: e.g., the current implementation
supports reading from databases both with and without
NOTMUCH_FEATURE_FILE_TERMS, but if you attempt to write a database
without this feature, some operations may return
NOTMUCH_STATUS_UPGRADE_REQUIRED (introduced in a later patch).

I wonder, though, if the "implementation support" bits should just be
omitted.  These comments are inevitably going to get out of sync with
the real implementation, and in this case an incorrect comment may be
worse than no comment.  I could put a comment over the whole enum
giving a more general description:

/* Bit masks for _notmuch_database::features.  Features are named,
 * independent aspects of the database schema.
 *
 * A database stores the set of features that it "uses" (implicitly
 * before database version 3 and explicitly as of version 3).
 *
 * A given library version will "recognize" a particular set of
 * features; if a database uses a feature that the library does not
 * recognize, the library will refuse to open it.  It is assumed the
 * set of recognized features grows monotonically over time.  A
 * library version will "implement" some subset of the recognized
 * features: some operations may require that the database use (or not
 * use) some feature, while other operations may support both
 * databases that use and that don't use some feature.
 *
 * On disk, the database stores string names for these features (see
 * the feature_names array).  These enum bit values are never
 * persisted to disk and may change freely.
 */

> > +    NOTMUCH_FEATURE_FILE_TERMS = 1 << 0,
> > +
> > +    /* If set, directory timestamps are stored in documents with
> > +     * XDIRECTORY terms and relative paths.  If unset, directory
> > +     * timestamps are stored in documents with XTIMESTAMP terms and
> > +     * absolute paths.
> > +     *
> > +     * Introduced: version 1.  Implementation support: required. */
> > +    NOTMUCH_FEATURE_DIRECTORY_DOCS = 1 << 1,
> > +
> > +    /* If set, the from, subject, and message-id headers are stored in
> > +     * message document values.  If unset, message documents *may*
> > +     * have these values, but if the value is empty, it must be
> > +     * retrieved from the message file.
> > +     *
> > +     * Introduced: optional in version 1, required as of version 3.
> > +     * Implementation support: both.
> > +     */
> > +    NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES = 1 << 2,
> > +
> > +    /* If set, folder terms are boolean and path terms exist.  If
> > +     * unset, folder terms are probabilistic and stemmed and path
> > +     * terms do not exist.
> > +     *
> > +     * Introduced: version 2.  Implementation support: required. */
> > +    NOTMUCH_FEATURE_BOOL_FOLDER = 1 << 3,
> > +};
> > +
> > +/* Prior to database version 3, features were implied by the database
> > + * version number, so hard-code them for earlier versions. */
> > +#define NOTMUCH_FEATURES_V0 (0)
> > +#define NOTMUCH_FEATURES_V1 (NOTMUCH_FEATURES_V0 | NOTMUCH_FEATURE_FILE_TERMS | \
> > +			     NOTMUCH_FEATURE_DIRECTORY_DOCS)
> > +#define NOTMUCH_FEATURES_V2 (NOTMUCH_FEATURES_V1 | NOTMUCH_FEATURE_BOOL_FOLDER)
> > +
> > +/* Current database features.  If any of these are missing from a
> > + * database, request an upgrade.
> > + * NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES is not included because
> > + * upgrade doesn't currently introduce the feature (though brand new
> > + * databases will have it). */
> > +#define NOTMUCH_FEATURES_CURRENT \
> > +    (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \
> > +     NOTMUCH_FEATURE_BOOL_FOLDER)
> > +
> >  /* Return the list of terms from the given iterator matching a prefix.
> >   * The prefix will be stripped from the strings in the returned list.
> >   * The list will be allocated using ctx as the talloc context.
> > diff --git a/lib/database.cc b/lib/database.cc
> > index 45c4260..29a56db 100644
> > --- a/lib/database.cc
> > +++ b/lib/database.cc
> > @@ -20,6 +20,7 @@
> >  
> >  #include "database-private.h"
> >  #include "parse-time-vrp.h"
> > +#include "string-util.h"
> >  
> >  #include <iostream>
> >  
> > @@ -42,7 +43,7 @@ typedef struct {
> >      const char *prefix;
> >  } prefix_t;
> >  
> > -#define NOTMUCH_DATABASE_VERSION 2
> > +#define NOTMUCH_DATABASE_VERSION 3
> >  
> >  #define STRINGIFY(s) _SUB_STRINGIFY(s)
> >  #define _SUB_STRINGIFY(s) #s
> > @@ -151,6 +152,17 @@ typedef struct {
> >   *			changes are made to the database (such as by
> >   *			indexing new fields).
> >   *
> > + *	features	The set of features supported by this
> > + *			database. This consists of a set of
> > + *			'\n'-separated lines, where each is a feature
> > + *			name, a '\t', and compatibility flags.  If the
> > + *			compatibility flags contain 'w', then the
> > + *			opener must support this feature to safely
> > + *			write this database.  If the compatibility
> > + *			flags contain 'r', then the opener must
> > + *			support this feature to read this database.
> > + *			Introduced in database version 3.
> > + *
> >   *	last_thread_id	The last thread ID generated. This is stored
> >   *			as a 16-byte hexadecimal ASCII representation
> >   *			of a 64-bit unsigned integer. The first ID
> > @@ -251,6 +263,25 @@ _find_prefix (const char *name)
> >      return "";
> >  }
> >  
> > +static const struct
> > +{
> 
> Brace should be at the end of the previous line.

Fixed.

> > +    /* NOTMUCH_FEATURE_* value. */
> > +    unsigned int value;
> > +    /* Feature name as it appears in the database.  This name should
> > +     * be appropriate for displaying to the user if an older version
> > +     * of notmuch doesn't support this feature. */
> > +    const char *name;
> > +    /* Compatibility flags when this feature is declared. */
> > +    const char *flags;
> > +} feature_names[] = {
> > +    {NOTMUCH_FEATURE_FILE_TERMS,             "multiple paths per message", "rw"},
> > +    {NOTMUCH_FEATURE_DIRECTORY_DOCS,         "relative directory paths", "rw"},
> > +    /* Header values are not required for reading a database because a
> > +     * reader can just refer to the message file. */
> > +    {NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES, "from/subject/message-ID in database", "w"},
> > +    {NOTMUCH_FEATURE_BOOL_FOLDER,            "exact folder:/path: search", "rw"},
> 
> Spaces missing after the opening braces and before the closing braces.

Fixed.  I also wrapped all of the entries between the enum name and
the string name since one of the lines was already too long and this
pushed two more over.

> > +};
> > +
> >  const char *
> >  notmuch_status_to_string (notmuch_status_t status)
> >  {
> > @@ -591,6 +622,11 @@ notmuch_database_create (const char *path, notmuch_database_t **database)
> >  				    &notmuch);
> >      if (status)
> >  	goto DONE;
> > +
> > +    /* Upgrade doesn't add this feature to existing databases, but new
> > +     * databases have it. */
> > +    notmuch->features |= NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES;
> > +
> >      status = notmuch_database_upgrade (notmuch, NULL, NULL);
> >      if (status) {
> >  	notmuch_database_close(notmuch);
> > @@ -619,6 +655,80 @@ _notmuch_database_ensure_writable (notmuch_database_t *notmuch)
> >      return NOTMUCH_STATUS_SUCCESS;
> >  }
> >  
> > +/* Parse a database features string from the given database version.
> > + *
> > + * For version < 3, this ignores the features string and returns a
> > + * hard-coded set of features.
> > + *
> > + * If there are unrecognized features that are required to open the
> > + * database in mode (which should be 'r' or 'w'), return a
> > + * comma-separated list of unrecognized but required features in
> > + * *incompat_out, which will be allocated from ctx.
> 
> Please describe the actual return value.

Fixed.  I also gave the enum type a name (in response to David's
review) and used it here, so this is more self-documenting.

> > + */
> > +static unsigned int
> > +_parse_features (const void *ctx, const char *features, unsigned int version,
> > +		 char mode, char **incompat_out)
> > +{
> > +    unsigned int res = 0, namelen, i;
> > +    size_t llen = 0;
> > +    const char *flags;
> > +
> > +    /* Prior to database version 3, features were implied by the
> > +     * version number. */
> > +    if (version == 0)
> > +	return NOTMUCH_FEATURES_V0;
> > +    else if (version == 1)
> > +	return NOTMUCH_FEATURES_V1;
> > +    else if (version == 2)
> > +	return NOTMUCH_FEATURES_V2;
> > +
> > +    /* Parse the features string */
> > +    while ((features = strtok_len_c (features + llen, "\n", &llen)) != NULL) {
> > +	flags = strchr (features, '\t');
> > +	if (! flags || flags > features + llen)
> > +	    continue;
> > +	namelen = flags - features;
> > +
> > +	for (i = 0; i < ARRAY_SIZE (feature_names); ++i) {
> > +	    if (strlen (feature_names[i].name) == namelen &&
> > +		strncmp (feature_names[i].name, features, namelen) == 0) {
> > +		res |= feature_names[i].value;
> > +		break;
> > +	    }
> > +	}
> > +
> > +	if (i == ARRAY_SIZE (feature_names)) {
> > +	    /* Unrecognized feature */
> > +	    const char *have = strchr (flags, mode);
> > +	    if (have && have < features + llen) {
> > +		/* This feature is required to access this database in
> > +		 * 'mode', but we don't understand it. */
> > +		if (! *incompat_out)
> > +		    *incompat_out = talloc_strdup (ctx, "");
> > +		*incompat_out = talloc_asprintf_append_buffer (
> > +		    *incompat_out, "%s%.*s", **incompat_out ? ", " : "",
> > +		    namelen, features);
> 
> Do we intend the lib user to be able to parse the features? Perhaps we
> should use '\n' as separator here too? (Although looks like *currently*
> this is only for printing the message from within the lib.)

I'd intended this only for user presentation.  I updated the comment
to mention this.  If it turns out we need this information in a
computer-friendly form in the future, it should be easy to add.

> > +	    }
> > +	}
> > +    }
> > +
> > +    return res;
> > +}
> > +
> > +static char *
> > +_print_features (const void *ctx, unsigned int features)
> > +{
> > +    unsigned int i;
> > +    char *res = talloc_strdup (ctx, "");
> > +
> > +    for (i = 0; i < ARRAY_SIZE (feature_names); ++i)
> > +	if (features & feature_names[i].value)
> > +	    res = talloc_asprintf_append_buffer (
> > +		res, "%s\t%s\n", feature_names[i].name, feature_names[i].flags);
> > +
> > +    return res;
> > +}
> > +
> >  notmuch_status_t
> >  notmuch_database_open (const char *path,
> >  		       notmuch_database_mode_t mode,
> > @@ -627,7 +737,7 @@ notmuch_database_open (const char *path,
> >      notmuch_status_t status = NOTMUCH_STATUS_SUCCESS;
> >      void *local = talloc_new (NULL);
> >      notmuch_database_t *notmuch = NULL;
> > -    char *notmuch_path, *xapian_path;
> > +    char *notmuch_path, *xapian_path, *incompat_features;
> >      struct stat st;
> >      int err;
> >      unsigned int i, version;
> > @@ -677,7 +787,6 @@ notmuch_database_open (const char *path,
> >      if (notmuch->path[strlen (notmuch->path) - 1] == '/')
> >  	notmuch->path[strlen (notmuch->path) - 1] = '\0';
> >  
> > -    notmuch->needs_upgrade = FALSE;
> >      notmuch->mode = mode;
> >      notmuch->atomic_nesting = 0;
> >      try {
> > @@ -686,37 +795,44 @@ notmuch_database_open (const char *path,
> >  	if (mode == NOTMUCH_DATABASE_MODE_READ_WRITE) {
> >  	    notmuch->xapian_db = new Xapian::WritableDatabase (xapian_path,
> >  							       Xapian::DB_CREATE_OR_OPEN);
> > -	    version = notmuch_database_get_version (notmuch);
> > -
> > -	    if (version > NOTMUCH_DATABASE_VERSION) {
> > -		fprintf (stderr,
> > -			 "Error: Notmuch database at %s\n"
> > -			 "       has a newer database format version (%u) than supported by this\n"
> > -			 "       version of notmuch (%u). Refusing to open this database in\n"
> > -			 "       read-write mode.\n",
> > -			 notmuch_path, version, NOTMUCH_DATABASE_VERSION);
> > -		notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY;
> > -		notmuch_database_destroy (notmuch);
> > -		notmuch = NULL;
> > -		status = NOTMUCH_STATUS_FILE_ERROR;
> > -		goto DONE;
> > -	    }
> > -
> > -	    if (version < NOTMUCH_DATABASE_VERSION)
> > -		notmuch->needs_upgrade = TRUE;
> >  	} else {
> >  	    notmuch->xapian_db = new Xapian::Database (xapian_path);
> > -	    version = notmuch_database_get_version (notmuch);
> > -	    if (version > NOTMUCH_DATABASE_VERSION)
> > -	    {
> > -		fprintf (stderr,
> > -			 "Warning: Notmuch database at %s\n"
> > -			 "         has a newer database format version (%u) than supported by this\n"
> > -			 "         version of notmuch (%u). Some operations may behave incorrectly,\n"
> > -			 "         (but the database will not be harmed since it is being opened\n"
> > -			 "         in read-only mode).\n",
> > -			 notmuch_path, version, NOTMUCH_DATABASE_VERSION);
> > -	    }
> > +	}
> > +
> > +	/* Check version.  As of database version 3, we represent
> > +	 * changes in terms of features, so assume a version bump
> > +	 * means a dramatically incompatible change. */
> > +	version = notmuch_database_get_version (notmuch);
> > +	if (version > NOTMUCH_DATABASE_VERSION) {
> > +	    fprintf (stderr,
> > +		     "Error: Notmuch database at %s\n"
> > +		     "       has a newer database format version (%u) than supported by this\n"
> > +		     "       version of notmuch (%u).\n",
> > +		     notmuch_path, version, NOTMUCH_DATABASE_VERSION);
> > +	    notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY;
> > +	    notmuch_database_destroy (notmuch);
> > +	    notmuch = NULL;
> > +	    status = NOTMUCH_STATUS_FILE_ERROR;
> > +	    goto DONE;
> > +	}
> > +
> > +	/* Check features. */
> > +	incompat_features = NULL;
> > +	notmuch->features = _parse_features (
> > +	    local, notmuch->xapian_db->get_metadata ("features").c_str (),
> > +	    version, mode == NOTMUCH_DATABASE_MODE_READ_WRITE ? 'w' : 'r',
> 
> Makes me think _parse_features could use notmuch_database_mode_t mode
> instead of char mode. *shrug*.

In principle there could be other compatibility modes, though I can't
imagine what they would be.

> > +	    &incompat_features);
> > +	if (incompat_features) {
> > +	    fprintf (stderr,
> > +		     "Error: Notmuch database at %s\n"
> > +		     "       requires features (%s)\n"
> > +		     "       not supported by this version of notmuch.\n",
> > +		     notmuch_path, incompat_features);
> > +	    notmuch->mode = NOTMUCH_DATABASE_MODE_READ_ONLY;
> > +	    notmuch_database_destroy (notmuch);
> > +	    notmuch = NULL;
> > +	    status = NOTMUCH_STATUS_FILE_ERROR;
> > +	    goto DONE;
> >  	}
> >  
> >  	notmuch->last_doc_id = notmuch->xapian_db->get_lastdocid ();
> > @@ -1048,7 +1164,8 @@ notmuch_database_get_version (notmuch_database_t *notmuch)
> >  notmuch_bool_t
> >  notmuch_database_needs_upgrade (notmuch_database_t *notmuch)
> >  {
> > -    return notmuch->needs_upgrade;
> > +    return notmuch->mode == NOTMUCH_DATABASE_MODE_READ_WRITE &&
> > +	(NOTMUCH_FEATURES_CURRENT & ~notmuch->features);
> >  }
> 
> Am I correct that this does not lead to a database upgrade from v2 to v3
> until we actually change the features?

Yep, that's true (though new databases will start with v3).
Conveniently, I have a bunch of patches that introduce some new
features pending right behind this series.  :)

> Aside, why don't we return a suitable status code from
> notmuch_database_open when an upgrade is required?

That may make sense at this point in the series, but later patches
make upgrades no longer "required" and instead enforce database
feature requirements on a per-operation basis.  At that point, this
function becomes effectively optional; more "wants_upgrade" than
"needs_upgrade".

> >  static volatile sig_atomic_t do_progress_notify = 0;
> > @@ -1077,6 +1194,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
> >  						   double progress),
> >  			  void *closure)
> >  {
> > +    void *local = talloc_new (NULL);
> >      Xapian::WritableDatabase *db;
> >      struct sigaction action;
> >      struct itimerval timerval;
> > @@ -1114,6 +1232,10 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
> >  	timer_is_active = TRUE;
> >      }
> >  
> > +    /* Set the target features so we write out changes in the desired
> > +     * format. */
> > +    notmuch->features |= NOTMUCH_FEATURES_CURRENT;
> > +
> >      /* Before version 1, each message document had its filename in the
> >       * data field. Copy that into the new format by calling
> >       * notmuch_message_add_filename.
> > @@ -1226,6 +1348,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
> >  	notmuch_query_destroy (query);
> >      }
> >  
> > +    db->set_metadata ("features", _print_features (local, notmuch->features));
> >      db->set_metadata ("version", STRINGIFY (NOTMUCH_DATABASE_VERSION));
> >      db->flush ();
> >  
> > @@ -1302,6 +1425,7 @@ notmuch_database_upgrade (notmuch_database_t *notmuch,
> >  	sigaction (SIGALRM, &action, NULL);
> >      }
> >  
> > +    talloc_free (local);
> >      return NOTMUCH_STATUS_SUCCESS;
> >  }
> >  


More information about the notmuch mailing list