[PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch

Austin Clements amdragon at MIT.EDU
Mon Oct 22 01:14:44 PDT 2012


Overall this looks pretty good to me, and I must say, this parser is
amazingly flexible and copes well with a remarkably hostile grammar.

A lot of little comments below (sorry if any of this ground has
already been covered in the previous four versions).

I do have one broad comment.  While I'm all for ad hoc parsers for ad
hoc grammars like dates, there is one piece of the literature I think
this parser suffers for by ignoring: tokenizing.  I think it would
simplify a lot of this code if it did a tokenizing pass before the
parsing pass.  It doesn't have to be a serious tokenizer with
streaming and keywords and token types and junk; just something that
first splits the input into substrings, possibly just non-overlapping
matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.].  This would simplify the
handling of postponed numbers because, with trivial lookahead in the
token stream, you wouldn't have to postpone them.  Likewise, it would
eliminate last_field.  It would simplify keyword matching because you
wouldn't have to worry about matching substrings (I spent a long time
staring at that code before I figured out what it would and wouldn't
accept).  Most important, I think it would make the parser more
predictable for users; for example, the parser currently accepts
things like "saturtoday" because it's aggressively single-pass.

Quoth Jani Nikula on Oct 22 at 12:22 am:
> Add a date/time parser to notmuch, to be used for adding date range
> query support for notmuch lib later on. Add the parser to a directory
> of its own to make it independent of the rest of the notmuch code
> base.
> 
> Signed-off-by: Jani Nikula <jani at nikula.org>
> ---
>  Makefile                              |    2 +-
>  parse-time-string/Makefile            |    5 +
>  parse-time-string/Makefile.local      |   12 +
>  parse-time-string/README              |    9 +
>  parse-time-string/parse-time-string.c | 1477 +++++++++++++++++++++++++++++++++
>  parse-time-string/parse-time-string.h |  102 +++
>  6 files changed, 1606 insertions(+), 1 deletion(-)
>  create mode 100644 parse-time-string/Makefile
>  create mode 100644 parse-time-string/Makefile.local
>  create mode 100644 parse-time-string/README
>  create mode 100644 parse-time-string/parse-time-string.c
>  create mode 100644 parse-time-string/parse-time-string.h
> 
> diff --git a/Makefile b/Makefile
> index e5e2e3a..bb9c316 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -3,7 +3,7 @@
>  all:
>  
>  # List all subdirectories here. Each contains its own Makefile.local
> -subdirs = compat completion emacs lib man util test
> +subdirs = compat completion emacs lib man parse-time-string util test
>  
>  # We make all targets depend on the Makefiles themselves.
>  global_deps = Makefile Makefile.config Makefile.local \
> diff --git a/parse-time-string/Makefile b/parse-time-string/Makefile
> new file mode 100644
> index 0000000..fa25832
> --- /dev/null
> +++ b/parse-time-string/Makefile
> @@ -0,0 +1,5 @@
> +all:
> +	$(MAKE) -C .. all
> +
> +.DEFAULT:
> +	$(MAKE) -C .. $@
> diff --git a/parse-time-string/Makefile.local b/parse-time-string/Makefile.local
> new file mode 100644
> index 0000000..53534f3
> --- /dev/null
> +++ b/parse-time-string/Makefile.local
> @@ -0,0 +1,12 @@
> +dir := parse-time-string
> +extra_cflags += -I$(srcdir)/$(dir)
> +
> +libparse-time-string_c_srcs := $(dir)/parse-time-string.c
> +
> +libparse-time-string_modules := $(libparse-time-string_c_srcs:.c=.o)
> +
> +$(dir)/libparse-time-string.a: $(libparse-time-string_modules)
> +	$(call quiet,AR) rcs $@ $^
> +
> +SRCS := $(SRCS) $(libparse-time-string_c_srcs)
> +CLEAN := $(CLEAN) $(libparse-time-string_modules) $(dir)/libparse-time-string.a
> diff --git a/parse-time-string/README b/parse-time-string/README
> new file mode 100644
> index 0000000..300ff1f
> --- /dev/null
> +++ b/parse-time-string/README
> @@ -0,0 +1,9 @@
> +PARSE TIME STRING
> +=================
> +
> +parse_time_string() is a date/time parser originally written for
> +notmuch by Jani Nikula <jani at nikula.org>. However, there is nothing
> +notmuch specific in it, and it should be kept reusable for other
> +projects, and ready to be packaged on its own as needed. Please do not
> +add dependencies on or references to anything notmuch specific. The
> +parser should only depend on the C library.
> diff --git a/parse-time-string/parse-time-string.c b/parse-time-string/parse-time-string.c
> new file mode 100644
> index 0000000..942041a
> --- /dev/null
> +++ b/parse-time-string/parse-time-string.c
> @@ -0,0 +1,1477 @@
> +/*
> + * parse time string - user friendly date and time parser
> + * Copyright © 2012 Jani Nikula
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Jani Nikula <jani at nikula.org>
> + */
> +
> +#include <assert.h>
> +#include <ctype.h>
> +#include <errno.h>
> +#include <limits.h>
> +#include <stdio.h>
> +#include <stdarg.h>
> +#include <stdbool.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <strings.h>
> +#include <time.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +
> +#include "parse-time-string.h"
> +
> +/*
> + * IMPLEMENTATION DETAILS
> + *
> + * At a high level, the parsing is done in two phases: 1) actual
> + * parsing of the input string and storing the parsed data into
> + * 'struct state', and 2) processing of the data in 'struct state'
> + * according to current time (or provided reference time) and
> + * rounding. This is evident in the main entry point function
> + * parse_time_string().
> + *
> + * 1) The parsing phase - parse_input()
> + *
> + * Parsing is greedy and happens from left to right. The parsing is as
> + * unambiguous as possible; only unambiguous date/time formats are
> + * accepted. Redundant or contradictory absolute date/time in the
> + * input (e.g. date specified multiple times/ways) is not
> + * accepted. Relative date/time on the other hand just accumulates if
> + * present multiple times (e.g. "5 days 5 days" just turns into 10
> + * days).
> + *
> + * Parsing decisions are made on the input format, not value. For
> + * example, "20/5/2005" fails because the recognized format here is
> + * MM/D/YYYY, even though the values would suggest DD/M/YYYY.
> + *
> + * Parsing is mostly stateless in the sense that parsing decisions are
> + * not made based on the values of previously parsed data, or whether
> + * certain data is present in the first place. (There are a few
> + * exceptions to the latter part, though, such as parsing of time zone
> + * that would otherwise look like plain time.)
> + *
> + * When the parser encounters a number that is not greedily parsed as
> + * part of a format, the interpretation is postponed until the next
> + * token is parsed. The parser for the next token may consume the
> + * previously postponed number. For example, when parsing "20 May" the
> + * meaning of "20" is not known until "May" is parsed. If the parser
> + * for the next token does not consume the postponed number, the
> + * number is handled as a "lone" number before parser for the next
> + * token finishes.
> + *
> + * 2) The processing phase - create_output()
> + *
> + * Once the parser in phase 1 has finished, 'struct state' contains
> + * all the information from the input string, and it's no longer
> + * needed. Since the parser does not even handle the concept of "now",
> + * the processing initializes the fields referring to the current
> + * date/time.
> + *
> + * If requested, the result is rounded towards past or future. The
> + * idea behind rounding is to support parsing date/time ranges in an
> + * obvious way. For example, for a range defined as two dates (without
> + * time), one would typically want to have an inclusive range from the
> + * beginning of start date to the end of the end date. The caller
> + * would use rounding towards past in the start date, and towards
> + * future in the end date.
> + *
> + * The absolute date and time is shifted by the relative date and
> + * time, and time zone adjustments are made. Daylight saving time
> + * (DST) is specifically *not* handled at all.
> + *
> + * Finally, the result is stored to time_t.
> + */
> +
> +#define unused(x) x __attribute__ ((unused))
> +
> +/* XXX: Redefine these to add i18n support. The keyword table uses
> + * N_() to mark strings to be translated; they are accessed
> + * dynamically using _(). */
> +#define _(s) (s)	/* i18n: define as gettext (s) */
> +#define N_(s) (s)	/* i18n: define as gettext_noop (s) */
> +
> +#define ARRAY_SIZE(a) (sizeof (a) / sizeof (a[0]))
> +
> +/*
> + * Field indices in the tm and set arrays of struct state.
> + *
> + * NOTE: There's some code that depends on the ordering of this enum.
> + */
> +enum field {
> +    /* Keep SEC...YEAR in this order. */
> +    TM_ABS_SEC,		/* seconds */
> +    TM_ABS_MIN,		/* minutes */
> +    TM_ABS_HOUR,	/* hours */
> +    TM_ABS_MDAY,	/* day of the month */
> +    TM_ABS_MON,		/* month */
> +    TM_ABS_YEAR,	/* year */
> +
> +    TM_ABS_WDAY,	/* day of the week. special: may be relative */

Given that this may be relative, should it really be called
TM_ABS_WDAY?

> +    TM_ABS_ISDST,	/* daylight saving time */
> +
> +    TM_AMPM,		/* am vs. pm */
> +    TM_TZ,		/* timezone in minutes */
> +
> +    /* Keep SEC...YEAR in this order. */
> +    TM_REL_SEC,		/* seconds relative to absolute or reference time */
> +    TM_REL_MIN,		/* minutes ... */
> +    TM_REL_HOUR,	/* hours ... */
> +    TM_REL_DAY,		/* days ... */
> +    TM_REL_MON,		/* months ... */
> +    TM_REL_YEAR,	/* years ... */
> +    TM_REL_WEEK,	/* weeks ... */
> +
> +    TM_NONE,		/* not a field */
> +
> +    TM_SIZE = TM_NONE,
> +    TM_FIRST_ABS = TM_ABS_SEC,
> +    TM_FIRST_REL = TM_REL_SEC,
> +};
> +
> +/* Values for the set array of struct state. */
> +enum field_set {
> +    FIELD_UNSET,	/* The field has not been touched by parser. */
> +    FIELD_SET,		/* The field has been set by parser. */
> +    FIELD_NOW,		/* The field will be set to reference time. */
> +};
> +
> +static enum field
> +next_abs_field (enum field field)
> +{
> +    /* NOTE: Depends on the enum ordering. */
> +    return field < TM_ABS_YEAR ? field + 1 : TM_NONE;
> +}
> +
> +static enum field
> +abs_to_rel_field (enum field field)
> +{
> +    assert (field <= TM_ABS_YEAR);
> +
> +    /* NOTE: Depends on the enum ordering. */
> +    return field + (TM_FIRST_REL - TM_FIRST_ABS);
> +}
> +
> +/* Get epoch value for field. */

Explain what an "epoch value" for a field is.

> +static int
> +field_epoch (enum field field)
> +{
> +    if (field == TM_ABS_MDAY || field == TM_ABS_MON)
> +	return 1;
> +    else if (field == TM_ABS_YEAR)
> +	return 1970;
> +    else
> +	return 0;
> +}
> +
> +/* The parsing state. */
> +struct state {
> +    int tm[TM_SIZE];			/* parsed date and time */
> +    enum field_set set[TM_SIZE];	/* set status of tm */
> +
> +    enum field last_field;	/* Previously set field. */
> +    char delim;
> +
> +    int postponed_length;	/* Number of digits in postponed value. */
> +    int postponed_value;
> +    char postponed_delim;	/* The delimiter preceding postponed number. */
> +};
> +
> +/*
> + * Helpers for postponed numbers.
> + *
> + * postponed_length is the number of digits in postponed value. 0
> + * means there is no postponed number. -1 means there is a postponed
> + * number, but it comes from a keyword, and it doesn't have digits.
> + */
> +static int
> +get_postponed_length (struct state *state)
> +{
> +    return state->postponed_length;
> +}
> +
> +/*
> + * Consume a previously postponed number. Return true if a number was
> + * in fact postponed, false otherwise. Store the postponed number's
> + * value in *v, length in the input string in *n (or -1 if the number
> + * was written out and parsed as a keyword), and the preceding
> + * delimiter to *d.

Mention that v, n, and d are unchanged if no number is postponed?  You
exploit this for default values elsewhere in the code.

> + */
> +static bool
> +get_postponed_number (struct state *state, int *v, int *n, char *d)

Maybe "consume_postponed_number" to emphasize that this function has
side-effects (and isn't simply a "getter")?

> +{
> +    if (!state->postponed_length)
> +	return false;
> +
> +    if (n)
> +	*n = state->postponed_length;
> +
> +    if (v)
> +	*v = state->postponed_value;
> +
> +    if (d)
> +	*d = state->postponed_delim;
> +
> +    state->postponed_length = 0;
> +    state->postponed_value = 0;
> +    state->postponed_delim = 0;
> +
> +    return true;
> +}
> +
> +static int parse_postponed_number (struct state *state, enum field next_field);
> +
> +/*
> + * Postpone a number to be handled later. If one exists already,
> + * handle it first. n may be -1 to indicate a keyword that has no
> + * number length.
> + */
> +static int
> +set_postponed_number (struct state *state, int v, int n)
> +{
> +    int r;
> +    char d = state->delim;
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, TM_NONE);
> +    if (r)
> +	return r;
> +
> +    state->postponed_length = n;
> +    state->postponed_value = v;
> +    state->postponed_delim = d;
> +
> +    return 0;
> +}
> +
> +static void
> +set_delim (struct state *state, char delim)
> +{
> +    state->delim = delim;
> +}
> +
> +static void
> +unset_delim (struct state *state)
> +{
> +    state->delim = 0;
> +}
> +
> +/*
> + * Field set/get/mod helpers.
> + */
> +
> +/* Return true if field has been set. */
> +static bool
> +is_field_set (struct state *state, enum field field)
> +{
> +    assert (field < ARRAY_SIZE (state->tm));
> +
> +    return field < ARRAY_SIZE (state->set) &&

state->tm and state->set are the same size, so this will always by
true given that the assert hasn't fired.  Is this just defensive
programming?

> +	   state->set[field] != FIELD_UNSET;
> +}
> +
> +static void
> +unset_field (struct state *state, enum field field)
> +{
> +    assert (field < ARRAY_SIZE (state->tm));
> +
> +    state->set[field] = FIELD_UNSET;
> +    state->tm[field] = 0;
> +}
> +
> +/*
> + * Set field to value. A field can only be set once to ensure the
> + * input does not contain redundant and potentially conflicting data.
> + */
> +static int
> +set_field (struct state *state, enum field field, int value)
> +{
> +    int r;
> +
> +    assert (field < ARRAY_SIZE (state->tm));
> +
> +    /* Fields can only be set once. */
> +    if (field < ARRAY_SIZE (state->set) && state->set[field] != FIELD_UNSET)

Same comment about array sizes.  Also, this should probably call
is_field_set instead of open-coding it (which would make the array
size check even more redundant!)

> +	return -PARSE_TIME_ERR_ALREADYSET;
> +
> +    state->set[field] = FIELD_SET;
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, field);

I don't understand the big picture with postponed number handling yet,
but is it worth mentioning in this function's doc comment that it
processes postponed numbers?

> +    if (r)
> +	return r;
> +
> +    unset_delim (state);
> +
> +    state->tm[field] = value;
> +    state->last_field = field;
> +
> +    return 0;
> +}
> +
> +/*
> + * Mark n fields in fields to be set to the reference date/time in the
> + * specified time zone, or local timezone if not specified. The fields
> + * will be initialized after parsing is complete and timezone is
> + * known.
> + */
> +static int
> +set_fields_to_now (struct state *state, enum field *fields, size_t n)
> +{
> +    size_t i;
> +    int r;
> +
> +    for (i = 0; i < n; i++) {
> +	r = set_field (state, fields[i], 0);
> +	if (r)
> +	    return r;
> +	state->set[fields[i]] = FIELD_NOW;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Modify field by adding value to it. To be used on relative fields,
> + * which can be modified multiple times (to accumulate). */
> +static int
> +mod_field (struct state *state, enum field field, int value)

add_to_field?

> +{
> +    int r;
> +
> +    assert (field < ARRAY_SIZE (state->tm));   /* assert relative??? */
> +
> +    if (field < ARRAY_SIZE (state->set))

Another redundant check?

> +	state->set[field] = FIELD_SET;
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, field);

This postponed number stuff is getting really confusing...

> +    if (r)
> +	return r;
> +
> +    unset_delim (state);
> +
> +    state->tm[field] += value;
> +    state->last_field = field;
> +
> +    return 0;
> +}
> +
> +/*
> + * Get field value. Make sure the field is set before query. It's most
> + * likely an error to call this while parsing (for example fields set
> + * as FIELD_NOW will only be set to some value after parsing).
> + */
> +static int
> +get_field (struct state *state, enum field field)
> +{
> +    assert (field < ARRAY_SIZE (state->tm));

Assert that the field is set?

> +
> +    return state->tm[field];
> +}
> +
> +/*
> + * Validity checkers.
> + */
> +static bool is_valid_12hour (int h)
> +{
> +    return h >= 0 && h <= 12;

h >= 1?

> +}
> +
> +static bool is_valid_time (int h, int m, int s)
> +{
> +    /* Allow 24:00:00 to denote end of day. */
> +    if (h == 24 && m == 0 && s == 0)
> +	return true;
> +
> +    return h >= 0 && h <= 23 && m >= 0 && m <= 59 && s >= 0 && s <= 59;
> +}
> +
> +static bool is_valid_mday (int mday)
> +{
> +    return mday >= 1 && mday <= 31;
> +}
> +
> +static bool is_valid_mon (int mon)
> +{
> +    return mon >= 1 && mon <= 12;
> +}
> +
> +static bool is_valid_year (int year)
> +{
> +    return year >= 1970;
> +}
> +
> +static bool is_valid_date (int year, int mon, int mday)
> +{
> +    return is_valid_year (year) && is_valid_mon (mon) && is_valid_mday (mday);
> +}
> +
> +/* Unset indicator for time and date set helpers. */
> +#define UNSET -1
> +
> +/* Time set helper. No input checking. Use UNSET (-1) to leave unset. */
> +static int
> +set_abs_time (struct state *state, int hour, int min, int sec)
> +{
> +    int r;
> +
> +    if (hour != UNSET) {
> +	if ((r = set_field (state, TM_ABS_HOUR, hour)))
> +	    return r;
> +    }
> +
> +    if (min != UNSET) {
> +	if ((r = set_field (state, TM_ABS_MIN, min)))
> +	    return r;
> +    }
> +
> +    if (sec != UNSET) {
> +	if ((r = set_field (state, TM_ABS_SEC, sec)))
> +	    return r;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Date set helper. No input checking. Use UNSET (-1) to leave unset. */
> +static int
> +set_abs_date (struct state *state, int year, int mon, int mday)
> +{
> +    int r;
> +
> +    if (year != UNSET) {
> +	if ((r = set_field (state, TM_ABS_YEAR, year)))
> +	    return r;
> +    }
> +
> +    if (mon != UNSET) {
> +	if ((r = set_field (state, TM_ABS_MON, mon)))
> +	    return r;
> +    }
> +
> +    if (mday != UNSET) {
> +	if ((r = set_field (state, TM_ABS_MDAY, mday)))
> +	    return r;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Keyword parsing and handling.
> + */
> +struct keyword;
> +typedef int (*setter_t)(struct state *state, struct keyword *kw);
> +
> +struct keyword {
> +    const char *name;	/* keyword */
> +    enum field field;	/* field to set, or FIELD_NONE if N/A */
> +    int value;		/* value to set, or 0 if N/A */
> +    setter_t set;	/* function to use for setting, if non-NULL */
> +};
> +
> +/*
> + * Setter callback functions for keywords.
> + */
> +static int
> +kw_set_default (struct state *state, struct keyword *kw)

It took me a while to figure out what the name of this had to do with
the action it performs, then I realized that it's never used in the
table and only called when set is NULL.  Given that, I think it would
make more sense to just put the set_field call in place of the one
current call to kw_set_default.  Currently, this seems like one
indirection too much.

> +{
> +    return set_field (state, kw->field, kw->value);
> +}
> +
> +static int
> +kw_set_rel (struct state *state, struct keyword *kw)
> +{
> +    int multiplier = 1;
> +
> +    /* Get a previously set multiplier, if any. */
> +    get_postponed_number (state, &multiplier, NULL, NULL);
> +
> +    /* Accumulate relative field values. */
> +    return mod_field (state, kw->field, multiplier * kw->value);
> +}
> +
> +static int
> +kw_set_number (struct state *state, struct keyword *kw)
> +{
> +    /* -1 = no length, from keyword. */
> +    return set_postponed_number (state, kw->value, -1);
> +}
> +
> +static int
> +kw_set_month (struct state *state, struct keyword *kw)
> +{
> +    int n = get_postponed_length (state);
> +
> +    /* Consume postponed number if it could be mday. This handles "20
> +     * January". */
> +    if (n == 1 || n == 2) {

Should this be (n && is_valid_mday (state->postponed_value))?  It
seems a little odd that postponed numbers three digits or longer are
treated as independent, but two digits numbers > 31 are an error.

> +	int r, v;
> +
> +	get_postponed_number (state, &v, NULL, NULL);
> +
> +	if (!is_valid_mday (v))
> +	    return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +	r = set_field (state, TM_ABS_MDAY, v);
> +	if (r)
> +	    return r;
> +    }
> +
> +    return set_field (state, kw->field, kw->value);
> +}
> +
> +static int
> +kw_set_ampm (struct state *state, struct keyword *kw)
> +{
> +    int n = get_postponed_length (state);
> +
> +    /* Consume postponed number if it could be hour. This handles
> +     * "5pm". */
> +    if (n == 1 || n == 2) {

Same comment as for kw_set_month.

> +	int r, v;
> +
> +	get_postponed_number (state, &v, NULL, NULL);
> +
> +	if (!is_valid_12hour (v))
> +	    return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +	r = set_abs_time (state, v, 0, 0);
> +	if (r)
> +	    return r;
> +    }
> +
> +    return set_field (state, kw->field, kw->value);
> +}
> +
> +static int
> +kw_set_timeofday (struct state *state, struct keyword *kw)
> +{
> +    return set_abs_time (state, kw->value, 0, 0);
> +}
> +
> +static int
> +kw_set_today (struct state *state, unused (struct keyword *kw))
> +{
> +    enum field fields[] = { TM_ABS_YEAR, TM_ABS_MON, TM_ABS_MDAY };
> +
> +    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
> +}
> +
> +static int
> +kw_set_now (struct state *state, unused (struct keyword *kw))
> +{
> +    enum field fields[] = { TM_ABS_HOUR, TM_ABS_MIN, TM_ABS_SEC };
> +
> +    return set_fields_to_now (state, fields, ARRAY_SIZE (fields));
> +}
> +
> +static int
> +kw_set_ordinal (struct state *state, struct keyword *kw)
> +{
> +    int n, v;
> +
> +    /* Require a postponed number. */
> +    if (!get_postponed_number (state, &v, &n, NULL))
> +	return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +    /* Ordinals are mday. */
> +    if (n != 1 && n != 2)

Is this redundant with your is_valid_mday test below?

> +	return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +    /* Be strict about st, nd, rd, and lax about th. */
> +    if (strcasecmp (kw->name, "st") == 0 && v != 1 && v != 21 && v != 31)
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +    else if (strcasecmp (kw->name, "nd") == 0 && v != 2 && v != 22)
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +    else if (strcasecmp (kw->name, "rd") == 0 && v != 3 && v != 23)
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +    else if (strcasecmp (kw->name, "th") == 0 && !is_valid_mday (v))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    return set_field (state, TM_ABS_MDAY, v);
> +}
> +
> +/*
> + * Accepted keywords.
> + *
> + * A keyword may optionally contain a '|' to indicate the minimum
> + * match length. Without one, full match is required. It's advisable
> + * to keep the minimum match parts unique across all keywords.
> + *
> + * If keyword begins with upper case letter, then the matching will be
> + * case sensitive. Otherwise the matching is case insensitive.
> + *
> + * If setter is NULL, set_default will be used.
> + *
> + * Note: Order matters. Matching is greedy, longest match is used, but
> + * of equal length matches the first one is used, unless there's an
> + * equal length case sensitive match which trumps case insensitive
> + * matches.

If you do have a tokenizer (or disallow mashing keywords together),
then all of complexity arising from longest match goes away because
the keyword token either will or won't match a keyword.  If you also
eliminate the rule for case sensitivity and put case-sensitive things
before conflicting case-insensitive things (so put "M" before
"m|inutes"), then you can simply use the first match.

> + */
> +static struct keyword keywords[] = {
> +    /* Weekdays. */
> +    { N_("sun|day"),	TM_ABS_WDAY,	0,	NULL },
> +    { N_("mon|day"),	TM_ABS_WDAY,	1,	NULL },
> +    { N_("tue|sday"),	TM_ABS_WDAY,	2,	NULL },
> +    { N_("wed|nesday"),	TM_ABS_WDAY,	3,	NULL },
> +    { N_("thu|rsday"),	TM_ABS_WDAY,	4,	NULL },
> +    { N_("fri|day"),	TM_ABS_WDAY,	5,	NULL },
> +    { N_("sat|urday"),	TM_ABS_WDAY,	6,	NULL },
> +
> +    /* Months. */
> +    { N_("jan|uary"),	TM_ABS_MON,	1,	kw_set_month },
> +    { N_("feb|ruary"),	TM_ABS_MON,	2,	kw_set_month },
> +    { N_("mar|ch"),	TM_ABS_MON,	3,	kw_set_month },
> +    { N_("apr|il"),	TM_ABS_MON,	4,	kw_set_month },
> +    { N_("may"),	TM_ABS_MON,	5,	kw_set_month },
> +    { N_("jun|e"),	TM_ABS_MON,	6,	kw_set_month },
> +    { N_("jul|y"),	TM_ABS_MON,	7,	kw_set_month },
> +    { N_("aug|ust"),	TM_ABS_MON,	8,	kw_set_month },
> +    { N_("sep|tember"),	TM_ABS_MON,	9,	kw_set_month },
> +    { N_("oct|ober"),	TM_ABS_MON,	10,	kw_set_month },
> +    { N_("nov|ember"),	TM_ABS_MON,	11,	kw_set_month },
> +    { N_("dec|ember"),	TM_ABS_MON,	12,	kw_set_month },
> +
> +    /* Durations. */
> +    { N_("y|ears"),	TM_REL_YEAR,	1,	kw_set_rel },
> +    { N_("w|eeks"),	TM_REL_WEEK,	1,	kw_set_rel },
> +    { N_("d|ays"),	TM_REL_DAY,	1,	kw_set_rel },
> +    { N_("h|ours"),	TM_REL_HOUR,	1,	kw_set_rel },
> +    { N_("hr|s"),	TM_REL_HOUR,	1,	kw_set_rel },
> +    { N_("m|inutes"),	TM_REL_MIN,	1,	kw_set_rel },
> +    /* M=months, m=minutes */
> +    { N_("M"),		TM_REL_MON,	1,	kw_set_rel },
> +    { N_("mins"),	TM_REL_MIN,	1,	kw_set_rel },
> +    { N_("mo|nths"),	TM_REL_MON,	1,	kw_set_rel },
> +    { N_("s|econds"),	TM_REL_SEC,	1,	kw_set_rel },
> +    { N_("secs"),	TM_REL_SEC,	1,	kw_set_rel },
> +
> +    /* Numbers. */
> +    { N_("one"),	TM_NONE,	1,	kw_set_number },
> +    { N_("two"),	TM_NONE,	2,	kw_set_number },
> +    { N_("three"),	TM_NONE,	3,	kw_set_number },
> +    { N_("four"),	TM_NONE,	4,	kw_set_number },
> +    { N_("five"),	TM_NONE,	5,	kw_set_number },
> +    { N_("six"),	TM_NONE,	6,	kw_set_number },
> +    { N_("seven"),	TM_NONE,	7,	kw_set_number },
> +    { N_("eight"),	TM_NONE,	8,	kw_set_number },
> +    { N_("nine"),	TM_NONE,	9,	kw_set_number },
> +    { N_("ten"),	TM_NONE,	10,	kw_set_number },
> +    { N_("dozen"),	TM_NONE,	12,	kw_set_number },
> +    { N_("hundred"),	TM_NONE,	100,	kw_set_number },
> +
> +    /* Special number forms. */
> +    { N_("this"),	TM_NONE,	0,	kw_set_number },
> +    { N_("last"),	TM_NONE,	1,	kw_set_number },
> +
> +    /* Other special keywords. */
> +    { N_("yesterday"),	TM_REL_DAY,	1,	kw_set_rel },
> +    { N_("today"),	TM_NONE,	0,	kw_set_today },
> +    { N_("now"),	TM_NONE,	0,	kw_set_now },
> +    { N_("noon"),	TM_NONE,	12,	kw_set_timeofday },
> +    { N_("midnight"),	TM_NONE,	0,	kw_set_timeofday },
> +    { N_("am"),		TM_AMPM,	0,	kw_set_ampm },
> +    { N_("a.m."),	TM_AMPM,	0,	kw_set_ampm },
> +    { N_("pm"),		TM_AMPM,	1,	kw_set_ampm },
> +    { N_("p.m."),	TM_AMPM,	1,	kw_set_ampm },
> +    { N_("st"),		TM_NONE,	0,	kw_set_ordinal },
> +    { N_("nd"),		TM_NONE,	0,	kw_set_ordinal },
> +    { N_("rd"),		TM_NONE,	0,	kw_set_ordinal },
> +    { N_("th"),		TM_NONE,	0,	kw_set_ordinal },
> +
> +    /* Timezone codes: offset in minutes. XXX: Add more codes. */
> +    { N_("pst"),	TM_TZ,		-8*60,	NULL },
> +    { N_("mst"),	TM_TZ,		-7*60,	NULL },
> +    { N_("cst"),	TM_TZ,		-6*60,	NULL },
> +    { N_("est"),	TM_TZ,		-5*60,	NULL },
> +    { N_("ast"),	TM_TZ,		-4*60,	NULL },
> +    { N_("nst"),	TM_TZ,		-(3*60+30),	NULL },
> +
> +    { N_("gmt"),	TM_TZ,		0,	NULL },
> +    { N_("utc"),	TM_TZ,		0,	NULL },
> +
> +    { N_("wet"),	TM_TZ,		0,	NULL },
> +    { N_("cet"),	TM_TZ,		1*60,	NULL },
> +    { N_("eet"),	TM_TZ,		2*60,	NULL },
> +    { N_("fet"),	TM_TZ,		3*60,	NULL },
> +
> +    { N_("wat"),	TM_TZ,		1*60,	NULL },
> +    { N_("cat"),	TM_TZ,		2*60,	NULL },
> +    { N_("eat"),	TM_TZ,		3*60,	NULL },
> +};
> +
> +/*
> + * Compare strings s and keyword. Return number of matching chars on
> + * match, 0 for no match. Match must be at least n chars, or all of
> + * keyword if n < 0, otherwise it's not a match. Use match_case for
> + * case sensitive matching.
> + */
> +static size_t
> +match_keyword (const char *s, const char *keyword, ssize_t n, bool match_case)
> +{
> +    ssize_t i;
> +
> +    if (!n)
> +	return 0;
> +
> +    for (i = 0; *s && *keyword; i++, s++, keyword++) {
> +	if (match_case) {
> +	    if (*s != *keyword)

The pointer arithmetic doesn't seem to buy anything here.  What about
just looping over i and using s[i] and keyword[i]?

> +		break;
> +	} else {
> +	    if (tolower ((unsigned char) *s) !=
> +		tolower ((unsigned char) *keyword))

I don't think the cast to unsigned char is necessary.

> +		break;
> +	}
> +    }
> +
> +    if (n > 0)
> +	return i < n ? 0 : i;
> +    else
> +	return *keyword ? 0 : i;
> +}
> +
> +/*
> + * Parse a keyword. Return < 0 on error, number of parsed chars on
> + * success.
> + */
> +static ssize_t
> +parse_keyword (struct state *state, const char *s)
> +{
> +    unsigned int i;
> +    size_t n, max_n = 0;
> +    struct keyword *kw = NULL;
> +    int r;
> +
> +    /* Match longest keyword */
> +    for (i = 0; i < ARRAY_SIZE (keywords); i++) {
> +	/* Match case if keyword begins with upper case letter. */
> +	bool mcase = isupper ((unsigned char) keywords[i].name[0]);

Same with this cast.

> +	ssize_t minlen = -1;
> +	char keyword[128];
> +	char *p;
> +
> +	strncpy (keyword, _(keywords[i].name), sizeof (keyword));
> +
> +	/* Truncate too long keywords. XXX: Make this dynamic? */
> +	keyword[sizeof (keyword) - 1] = '\0';
> +
> +	/* Minimum match length. */
> +	p = strchr (keyword, '|');
> +	if (p) {
> +	    minlen = p - keyword;
> +
> +	    /* Remove the minimum match length separator. */
> +	    memmove (p, p + 1, strlen (p + 1) + 1);
> +	}

Would it make more sense to make match_keyword aware of the |
character?  Then you wouldn't need this dance with copying the keyword
into a scratch buffer.  I'm thinking something like (untested)

static size_t
match_keyword (const char *s, const char *keyword, bool match_case)
{
    size_t i;
    bool prefix_matched = false;

    for (i = 0; *s && *keyword; i++, s++, keyword++) {
        if (*keyword == '|') {
            prefix_matched = true;
            ++keyword;
        }
        if (match_case && *s != *keyword)
            return 0;
        else if (tolower (*s) != tolower (*keyword))
            return 0;
    }

    if (!*keyword || prefix_matched)
        return i;
    return 0;
}

> +
> +	n = match_keyword (s, keyword, minlen, mcase);
> +	if (n > max_n || (n == max_n && mcase)) {
> +	    max_n = n;
> +	    kw = &keywords[i];
> +	}
> +    }
> +
> +    if (!kw)
> +	return -PARSE_TIME_ERR_KEYWORD;
> +
> +    if (kw->set)
> +	r = kw->set (state, kw);
> +    else
> +	r = kw_set_default (state, kw);
> +
> +    if (r < 0)
> +	return r;
> +
> +    return max_n;
> +}
> +
> +/*
> + * Non-keyword parsers and their helpers.
> + */
> +
> +static int
> +set_user_tz (struct state *state, char sign, int hour, int min)
> +{
> +    int tz = hour * 60 + min;
> +
> +    assert (sign == '+' || sign == '-');
> +
> +    if (hour < 0 || hour > 14 || min < 0 || min > 59 || min % 15)

Good to see you're not forgetting our Kiribati notmuch user base.

> +	return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +    if (sign == '-')
> +	tz = -tz;
> +
> +    return set_field (state, TM_TZ, tz);
> +}
> +
> +/*
> + * Parse a previously postponed number if one exists. Independent
> + * parsing of a postponed number when it wasn't consumed during
> + * parsing of the following token.
> + */
> +static int
> +parse_postponed_number (struct state *state, unused (enum field next_field))
> +{
> +    int v, n;
> +    char d;
> +
> +    /* Bail out if there's no postponed number. */
> +    if (!get_postponed_number (state, &v, &n, &d))
> +	return 0;
> +
> +    if (n == 1 || n == 2) {
> +	/* Notable exception: Previous field affects parsing. This
> +	 * handles "January 20". */
> +	if (state->last_field == TM_ABS_MON) {
> +	    /* D[D] */
> +	    if (!is_valid_mday (v))
> +		return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +	    return set_field (state, TM_ABS_MDAY, v);
> +	} else if (n == 2) {
> +	    /* XXX: Only allow if last field is hour, min, or sec? */
> +	    if (d == '+' || d == '-') {
> +		/* +/-HH */
> +		return set_user_tz (state, d, v, 0);
> +	    }
> +	}
> +    } else if (n == 4) {
> +	/* Notable exception: Value affects parsing. Time zones are
> +	 * always at most 1400 and we don't understand years before
> +	 * 1970. */
> +	if (!is_valid_year (v)) {
> +	    if (d == '+' || d == '-') {
> +		/* +/-HHMM */
> +		return set_user_tz (state, d, v / 100, v % 100);
> +	    }
> +	} else {
> +	    /* YYYY */
> +	    return set_field (state, TM_ABS_YEAR, v);
> +	}
> +    } else if (n == 6) {
> +	/* HHMMSS */
> +	int hour = v / 10000;
> +	int min = (v / 100) % 100;
> +	int sec = v % 100;
> +
> +	if (!is_valid_time (hour, min, sec))
> +	    return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +	return set_abs_time (state, hour, min, sec);
> +    } else if (n == 8) {
> +	/* YYYYMMDD */
> +	int year = v / 10000;
> +	int mon = (v / 100) % 100;
> +	int mday = v % 100;
> +
> +	if (!is_valid_date (year, mon, mday))
> +	    return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +	return set_abs_date (state, year, mon, mday);
> +    } else {
> +	return -PARSE_TIME_ERR_FORMAT;

No need for the else block, given the return at the end.

> +    }
> +
> +    return -PARSE_TIME_ERR_FORMAT;
> +}
> +
> +static int tm_get_field (const struct tm *tm, enum field field);
> +
> +static int
> +set_timestamp (struct state *state, time_t t)
> +{
> +    struct tm tm;
> +    enum field f;
> +    int r;
> +
> +    if (gmtime_r (&t, &tm) == NULL)
> +	return -PARSE_TIME_ERR_LIB;
> +
> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> +	r = set_field (state, f, tm_get_field (&tm, f));
> +	if (r)
> +	    return r;
> +    }
> +
> +    r = set_field (state, TM_TZ, 0);
> +    if (r)
> +	return r;
> +
> +    /* XXX: Prevent TM_AMPM with timestamp, e.g. "@123456 pm" */
> +
> +    return 0;
> +}
> +
> +/* Parse a single number. Typically postpone parsing until later. */
> +static int
> +parse_single_number (struct state *state, unsigned long v,
> +		     unsigned long n)
> +{
> +    assert (n);
> +
> +    if (state->delim == '@')
> +	return set_timestamp (state, (time_t) v);
> +
> +    if (v > INT_MAX)
> +	return -PARSE_TIME_ERR_FORMAT;
> +
> +    return set_postponed_number (state, v, n);
> +}
> +
> +static bool
> +is_time_sep (char c)
> +{
> +    return c == ':';
> +}
> +
> +static bool
> +is_date_sep (char c)
> +{
> +    return c == '/' || c == '-' || c == '.';
> +}
> +
> +static bool
> +is_sep (char c)
> +{
> +    return is_time_sep (c) || is_date_sep (c);
> +}
> +
> +/* Two-digit year: 00...69 is 2000s, 70...99 1900s, if n == 0 keep
> + * unset. */
> +static int
> +expand_year (unsigned long year, size_t n)
> +{
> +    if (n == 2) {
> +	return (year < 70 ? 2000 : 1900) + year;
> +    } else if (n == 4) {
> +	return year;
> +    } else {
> +	return UNSET;
> +    }
> +}
> +
> +/* Parse a date number triplet. */
> +static int
> +parse_date (struct state *state, char sep,
> +	    unsigned long v1, unsigned long v2, unsigned long v3,
> +	    size_t n1, size_t n2, size_t n3)
> +{
> +    int year = UNSET, mon = UNSET, mday = UNSET;
> +
> +    assert (is_date_sep (sep));
> +
> +    switch (sep) {
> +    case '/': /* Date: M[M]/D[D][/YY[YY]] or M[M]/YYYY */
> +	if (n1 != 1 && n1 != 2)
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +	if ((n2 == 1 || n2 == 2) && (n3 == 0 || n3 == 2 || n3 == 4)) {
> +	    /* M[M]/D[D][/YY[YY]] */
> +	    year = expand_year (v3, n3);
> +	    mon = v1;
> +	    mday = v2;
> +	} else if (n2 == 4 && n3 == 0) {
> +	    /* M[M]/YYYY */
> +	    year = v2;
> +	    mon = v1;
> +	} else {
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +	}
> +	break;
> +
> +    case '-': /* Date: YYYY-MM[-DD] or DD-MM[-YY[YY]] or MM-YYYY */
> +	if (n1 == 4 && n2 == 2 && (n3 == 0 || n3 == 2)) {
> +	    /* YYYY-MM[-DD] */
> +	    year = v1;
> +	    mon = v2;
> +	    if (n3)
> +		mday = v3;
> +	} else if (n1 == 2 && n2 == 2 && (n3 == 0 || n3 == 2 || n3 == 4)) {
> +	    /* DD-MM[-YY[YY]] */
> +	    year = expand_year (v3, n3);
> +	    mon = v2;
> +	    mday = v1;
> +	} else if (n1 == 2 && n2 == 4 && n3 == 0) {
> +	    /* MM-YYYY */
> +	    year = v2;
> +	    mon = v1;
> +	} else {
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +	}
> +	break;
> +
> +    case '.': /* Date: D[D].M[M][.[YY[YY]]] */
> +	if ((n1 != 1 && n1 != 2) || (n2 != 1 && n2 != 2) ||
> +	    (n3 != 0 && n3 != 2 && n3 != 4))
> +	    return -PARSE_TIME_ERR_DATEFORMAT;
> +
> +	year = expand_year (v3, n3);
> +	mon = v2;
> +	mday = v1;
> +	break;
> +    }
> +
> +    if (year != UNSET && !is_valid_year (year))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    if (mon != UNSET && !is_valid_mon (mon))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    if (mday != UNSET && !is_valid_mday (mday))
> +	return -PARSE_TIME_ERR_INVALIDDATE;
> +
> +    return set_abs_date (state, year, mon, mday);
> +}
> +
> +/* Parse a time number triplet. */
> +static int
> +parse_time (struct state *state, char sep,
> +	    unsigned long v1, unsigned long v2, unsigned long v3,
> +	    size_t n1, size_t n2, size_t n3)
> +{
> +    assert (is_time_sep (sep));
> +
> +    if ((n1 != 1 && n1 != 2) || n2 != 2 || (n3 != 0 && n3 != 2))
> +	return -PARSE_TIME_ERR_TIMEFORMAT;
> +
> +    /*
> +     * Notable exception: Previously set fields affect
> +     * parsing. Interpret (+|-)HH:MM as time zone only if hour and
> +     * minute have been set.
> +     *
> +     * XXX: This could be fixed by restricting the delimiters
> +     * preceding time. For '+' it would be justified, but for '-' it
> +     * might be inconvenient. However prefer to allow '-' as an
> +     * insignificant delimiter preceding time for convenience, and
> +     * handle '+' the same way for consistency between positive and
> +     * negative time zones.
> +     */
> +    if (is_field_set (state, TM_ABS_HOUR) &&
> +	is_field_set (state, TM_ABS_MIN) &&
> +	n1 == 2 && n2 == 2 && n3 == 0 &&
> +	(state->delim == '+' || state->delim == '-')) {
> +	return set_user_tz (state, state->delim, v1, v2);
> +    }
> +
> +    if (!is_valid_time (v1, v2, v3))
> +	return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +    return set_abs_time (state, v1, v2, n3 ? v3 : 0);
> +}
> +
> +/* strtoul helper that assigns length. */
> +static unsigned long
> +strtoul_len (const char *s, const char **endp, size_t *len)
> +{
> +    unsigned long val = strtoul (s, (char **) endp, 10);

This could technically get confused by really large numbers, but I
don't know if that's worth worrying about.

> +
> +    *len = *endp - s;
> +    return val;
> +}
> +
> +/*
> + * Parse a (group of) number(s). Return < 0 on error, number of parsed
> + * chars on success.
> + */
> +static ssize_t
> +parse_number (struct state *state, const char *s)
> +{
> +    int r;
> +    unsigned long v1, v2, v3 = 0;
> +    size_t n1, n2, n3 = 0;
> +    const char *p = s;
> +    char sep;
> +
> +    v1 = strtoul_len (p, &p, &n1);
> +
> +    if (is_sep (*p) && isdigit ((unsigned char) *(p + 1))) {

Unnecessary cast?

> +	sep = *p;
> +	v2 = strtoul_len (p + 1, &p, &n2);
> +    } else {
> +	/* A single number. */
> +	r = parse_single_number (state, v1, n1);
> +	if (r)
> +	    return r;
> +
> +	return p - s;

I found the control flow here confusing.  You might want to flip the
two conditions so the single number return happens first and the rest
of the code flows straight through:

if (!is_sep (*p) || !isdigit (*(p + 1))) {
    ...
    return p - s;
}

sep = *p;
...

> +    }
> +
> +    /* A group of two or three numbers? */
> +    if (*p == sep && isdigit ((unsigned char) *(p + 1)))
> +	v3 = strtoul_len (p + 1, &p, &n3);
> +
> +    if (is_time_sep (sep))
> +	r = parse_time (state, sep, v1, v2, v3, n1, n2, n3);
> +    else
> +	r = parse_date (state, sep, v1, v2, v3, n1, n2, n3);
> +
> +    if (r)
> +	return r;
> +
> +    return p - s;
> +}
> +
> +/*
> + * Parse delimiter(s). Throw away all except the last one, which is
> + * stored for parsing the next non-delimiter. Return < 0 on error,
> + * number of parsed chars on success.
> + *
> + * XXX: We might want to be more strict here.
> + */
> +static ssize_t
> +parse_delim (struct state *state, const char *s)
> +{
> +    const char *p = s;
> +
> +    /*
> +     * Skip non-alpha and non-digit, and store the last for further
> +     * processing.
> +     */
> +    while (*p && !isalnum ((unsigned char) *p)) {
> +	set_delim (state, *p);
> +	p++;
> +    }
> +
> +    return p - s;
> +}
> +
> +/*
> + * Parse a date/time string. Return < 0 on error, number of parsed
> + * chars on success.
> + */
> +static ssize_t
> +parse_input (struct state *state, const char *s)
> +{
> +    const char *p = s;
> +    ssize_t n;
> +    int r;
> +
> +    while (*p) {
> +	if (isalpha ((unsigned char) *p)) {
> +	    n = parse_keyword (state, p);
> +	} else if (isdigit ((unsigned char) *p)) {
> +	    n = parse_number (state, p);
> +	} else {
> +	    n = parse_delim (state, p);
> +	}
> +
> +	if (n <= 0) {
> +	    if (n == 0)
> +		n = -PARSE_TIME_ERR;
> +
> +	    return n;
> +	}
> +
> +	p += n;
> +    }
> +
> +    /* Parse a previously postponed number, if any. */
> +    r = parse_postponed_number (state, TM_NONE);
> +    if (r < 0)
> +	return r;
> +
> +    return p - s;
> +}
> +
> +/*
> + * Processing the parsed input.
> + */
> +
> +/*
> + * Initialize reference time to tm. Use time zone in state if
> + * specified, otherwise local time. Use now for reference time if
> + * non-NULL, otherwise current time.
> + */
> +static int
> +initialize_now (struct state *state, struct tm *tm, const time_t *now)

Should tm be the last argument, since it's an out-argument?

Why is now a pointer?  Just so it can be NULL?

> +{
> +    time_t t;
> +
> +    if (now) {
> +	t = *now;
> +    } else {
> +	if (time (&t) == (time_t) -1)
> +	    return -PARSE_TIME_ERR_LIB;
> +    }
> +
> +    if (is_field_set (state, TM_TZ)) {
> +	/* Some other time zone. */
> +
> +	/* Adjust now according to the TZ. */
> +	t += get_field (state, TM_TZ) * 60;
> +
> +	/* It's not gm, but this doesn't mess with the TZ. */
> +	if (gmtime_r (&t, tm) == NULL)
> +	    return -PARSE_TIME_ERR_LIB;
> +    } else {
> +	/* Local time. */
> +	if (localtime_r (&t, tm) == NULL)
> +	    return -PARSE_TIME_ERR_LIB;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Normalize tm according to mktime(3). Both mktime(3) and

This comment could elaborate a bit on what it means to normalize a tm.

> + * localtime_r(3) use local time, but they cancel each other out here,
> + * making this function agnostic to time zone.
> + */
> +static int
> +normalize_tm (struct tm *tm)
> +{
> +    time_t t = mktime (tm);
> +
> +    if (t == (time_t) -1)
> +	return -PARSE_TIME_ERR_LIB;
> +
> +    if (!localtime_r (&t, tm))
> +	return -PARSE_TIME_ERR_LIB;

Do you actually need this call to localtime_r or can you just return
after the mktime modifies tm?  Does this have to do with timezones?

> +
> +    return 0;
> +}
> +
> +/* Get field out of a struct tm. */
> +static int
> +tm_get_field (const struct tm *tm, enum field field)
> +{
> +    switch (field) {
> +    case TM_ABS_SEC:	return tm->tm_sec;
> +    case TM_ABS_MIN:	return tm->tm_min;
> +    case TM_ABS_HOUR:	return tm->tm_hour;
> +    case TM_ABS_MDAY:	return tm->tm_mday;
> +    case TM_ABS_MON:	return tm->tm_mon + 1; /* 0- to 1-based */
> +    case TM_ABS_YEAR:	return 1900 + tm->tm_year;
> +    case TM_ABS_WDAY:	return tm->tm_wday;
> +    case TM_ABS_ISDST:	return tm->tm_isdst;
> +    default:
> +	assert (false);
> +	break;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Modify hour according to am/pm setting. */
> +static int
> +fixup_ampm (struct state *state)
> +{
> +    int hour, hdiff = 0;
> +
> +    if (!is_field_set (state, TM_AMPM))
> +	return 0;
> +
> +    if (!is_field_set (state, TM_ABS_HOUR))
> +	return -PARSE_TIME_ERR_TIMEFORMAT;
> +
> +    hour = get_field (state, TM_ABS_HOUR);
> +    if (!is_valid_12hour (hour))
> +	return -PARSE_TIME_ERR_INVALIDTIME;
> +
> +    if (get_field (state, TM_AMPM)) {
> +	/* 12pm is noon. */
> +	if (hour != 12)
> +	    hdiff = 12;
> +    } else {
> +	/* 12am is midnight, beginning of day. */
> +	if (hour == 12)
> +	    hdiff = -12;
> +    }
> +
> +    mod_field (state, TM_REL_HOUR, -hdiff);
> +
> +    return 0;
> +}
> +
> +/* Combine absolute and relative fields, and round. */
> +static int
> +create_output (struct state *state, time_t *t_out, const time_t *ref,
> +	       int round)
> +{
> +    struct tm tm = { .tm_isdst = -1 };
> +    struct tm now;
> +    time_t t;
> +    enum field f;
> +    int r;
> +    int week_round = PARSE_TIME_NO_ROUND;
> +
> +    r = initialize_now (state, &now, ref);
> +    if (r)
> +	return r;
> +
> +    /* Initialize fields flagged as "now" to reference time. */
> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> +	if (state->set[f] == FIELD_NOW) {
> +	    state->tm[f] = tm_get_field (&now, f);
> +	    state->set[f] = FIELD_SET;
> +	}
> +    }
> +
> +    /*
> +     * If WDAY is set but MDAY is not, we consider WDAY relative
> +     *
> +     * XXX: This fails on stuff like "two months monday" because two
> +     * months ago wasn't the same day as today. Postpone until we know
> +     * date?
> +     */
> +    if (is_field_set (state, TM_ABS_WDAY) &&
> +	!is_field_set (state, TM_ABS_MDAY)) {
> +	int wday = get_field (state, TM_ABS_WDAY);
> +	int today = tm_get_field (&now, TM_ABS_WDAY);
> +	int rel_days;
> +
> +	if (today > wday)
> +	    rel_days = today - wday;
> +	else
> +	    rel_days = today + 7 - wday;
> +
> +	/* This also prevents special week rounding from happening. */
> +	mod_field (state, TM_REL_DAY, rel_days);
> +
> +	unset_field (state, TM_ABS_WDAY);
> +    }
> +
> +    r = fixup_ampm (state);
> +    if (r)
> +	return r;
> +
> +    /*
> +     * Iterate fields from most accurate to least accurate, and set
> +     * unset fields according to requested rounding.
> +     */
> +    for (f = TM_ABS_SEC; f != TM_NONE; f = next_abs_field (f)) {
> +	if (round != PARSE_TIME_NO_ROUND) {
> +	    enum field r = abs_to_rel_field (f);
> +
> +	    if (is_field_set (state, f) || is_field_set (state, r)) {
> +		if (round >= PARSE_TIME_ROUND_UP && f != TM_ABS_SEC) {
> +		    mod_field (state, r, -1);

Crazy.  This could use a comment.  It took me a while to figure out
why this was -1, though maybe that's just because it's late.

> +		    if (round == PARSE_TIME_ROUND_UP_INCLUSIVE)
> +			mod_field (state, TM_REL_SEC, 1);
> +		}
> +		round = PARSE_TIME_NO_ROUND; /* No more rounding. */
> +	    } else {
> +		if (f == TM_ABS_MDAY &&
> +		    is_field_set (state, TM_REL_WEEK)) {
> +		    /* Week is most accurate. */
> +		    week_round = round;
> +		    round = PARSE_TIME_NO_ROUND;
> +		} else {
> +		    set_field (state, f, field_epoch (f));
> +		}
> +	    }
> +	}
> +
> +	if (!is_field_set (state, f))
> +	    set_field (state, f, tm_get_field (&now, f));
> +    }
> +
> +    /* Special case: rounding with week accuracy. */
> +    if (week_round != PARSE_TIME_NO_ROUND) {
> +	/* Temporarily set more accurate fields to now. */
> +	set_field (state, TM_ABS_SEC, tm_get_field (&now, TM_ABS_SEC));
> +	set_field (state, TM_ABS_MIN, tm_get_field (&now, TM_ABS_MIN));
> +	set_field (state, TM_ABS_HOUR, tm_get_field (&now, TM_ABS_HOUR));
> +	set_field (state, TM_ABS_MDAY, tm_get_field (&now, TM_ABS_MDAY));
> +    }
> +
> +    /*
> +     * Set all fields. They may contain out of range values before
> +     * normalization by mktime(3).
> +     */
> +    tm.tm_sec = get_field (state, TM_ABS_SEC) - get_field (state, TM_REL_SEC);
> +    tm.tm_min = get_field (state, TM_ABS_MIN) - get_field (state, TM_REL_MIN);
> +    tm.tm_hour = get_field (state, TM_ABS_HOUR) - get_field (state, TM_REL_HOUR);
> +    tm.tm_mday = get_field (state, TM_ABS_MDAY) -
> +		 get_field (state, TM_REL_DAY) - 7 * get_field (state, TM_REL_WEEK);
> +    tm.tm_mon = get_field (state, TM_ABS_MON) - get_field (state, TM_REL_MON);
> +    tm.tm_mon--; /* 1- to 0-based */
> +    tm.tm_year = get_field (state, TM_ABS_YEAR) - get_field (state, TM_REL_YEAR) - 1900;
> +
> +    /*
> +     * It's always normal time.
> +     *
> +     * XXX: This is probably not a solution that universally
> +     * works. Just make sure DST is not taken into account. We don't
> +     * want rounding to be affected by DST.
> +     */
> +    tm.tm_isdst = -1;
> +
> +    /* Special case: rounding with week accuracy. */
> +    if (week_round != PARSE_TIME_NO_ROUND) {
> +	/* Normalize to get proper tm.wday. */
> +	r = normalize_tm (&tm);
> +	if (r < 0)
> +	    return r;
> +
> +	/* Set more accurate fields back to zero. */
> +	tm.tm_sec = 0;
> +	tm.tm_min = 0;
> +	tm.tm_hour = 0;
> +	tm.tm_isdst = -1;
> +
> +	/* Monday is the true 1st day of week, but this is easier. */
> +	if (week_round >= PARSE_TIME_ROUND_UP) {
> +	    tm.tm_mday += 7 - tm.tm_wday;
> +	    if (week_round == PARSE_TIME_ROUND_UP_INCLUSIVE)
> +		tm.tm_sec--;
> +	} else {
> +	    tm.tm_mday -= tm.tm_wday;
> +	}
> +    }
> +
> +    if (is_field_set (state, TM_TZ)) {
> +	/* tm is in specified TZ, convert to UTC for timegm(3). */
> +	tm.tm_min -= get_field (state, TM_TZ);
> +	t = timegm (&tm);
> +    } else {
> +	/* tm is in local time. */
> +	t = mktime (&tm);
> +    }
> +
> +    if (t == (time_t) -1)
> +	return -PARSE_TIME_ERR_LIB;
> +
> +    *t_out = t;
> +
> +    return 0;
> +}
> +
> +/* Internally, all errors are < 0. parse_time_string() returns errors > 0. */
> +#define EXTERNAL_ERR(r) (-r)
> +
> +int
> +parse_time_string (const char *s, time_t *t, const time_t *ref, int round)
> +{
> +    struct state state = { .last_field = TM_NONE };
> +    int r;
> +
> +    if (!s || !t)
> +	return EXTERNAL_ERR (-PARSE_TIME_ERR);
> +
> +    r = parse_input (&state, s);
> +    if (r < 0)
> +	return EXTERNAL_ERR (r);
> +
> +    r = create_output (&state, t, ref, round);
> +    if (r < 0)
> +	return EXTERNAL_ERR (r);
> +
> +    return 0;
> +}
> diff --git a/parse-time-string/parse-time-string.h b/parse-time-string/parse-time-string.h
> new file mode 100644
> index 0000000..bfa4ee3
> --- /dev/null
> +++ b/parse-time-string/parse-time-string.h
> @@ -0,0 +1,102 @@
> +/*
> + * parse time string - user friendly date and time parser
> + * Copyright © 2012 Jani Nikula
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Author: Jani Nikula <jani at nikula.org>
> + */
> +
> +#ifndef PARSE_TIME_STRING_H
> +#define PARSE_TIME_STRING_H
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include <time.h>
> +
> +/* return values for parse_time_string() */
> +enum {
> +    PARSE_TIME_OK = 0,
> +    PARSE_TIME_ERR,		/* unspecified error */
> +    PARSE_TIME_ERR_LIB,		/* library call failed */
> +    PARSE_TIME_ERR_ALREADYSET,	/* attempt to set unit twice */
> +    PARSE_TIME_ERR_FORMAT,	/* generic date/time format error */
> +    PARSE_TIME_ERR_DATEFORMAT,	/* date format error */
> +    PARSE_TIME_ERR_TIMEFORMAT,	/* time format error */
> +    PARSE_TIME_ERR_INVALIDDATE,	/* date value error */
> +    PARSE_TIME_ERR_INVALIDTIME,	/* time value error */
> +    PARSE_TIME_ERR_KEYWORD,	/* unknown keyword */
> +};
> +
> +/* round values for parse_time_string() */
> +enum {
> +    PARSE_TIME_ROUND_DOWN = -1,
> +    PARSE_TIME_NO_ROUND = 0,
> +    PARSE_TIME_ROUND_UP = 1,
> +    PARSE_TIME_ROUND_UP_INCLUSIVE = 2,
> +};
> +
> +/**
> + * parse_time_string() - user friendly date and time parser
> + * @s:		string to parse
> + * @t:		pointer to time_t to store parsed time in
> + * @ref:	pointer to time_t containing reference date/time, or NULL
> + * @round:	PARSE_TIME_NO_ROUND, PARSE_TIME_ROUND_DOWN, or
> + *		PARSE_TIME_ROUND_UP
> + *
> + * Parse a date/time string 's' and store the parsed date/time result
> + * in 't'.
> + *
> + * A reference date/time is used for determining the "date/time units"
> + * (roughly equivalent to struct tm members) not specified by 's'. If
> + * 'ref' is non-NULL, it must contain a pointer to a time_t to be used
> + * as reference date/time. Otherwise, the current time is used.
> + *
> + * If 's' does not specify a full date/time, the 'round' parameter
> + * specifies if and how the result should be rounded as follows:
> + *
> + *   PARSE_TIME_NO_ROUND: All date/time units that are not specified
> + *   by 's' are set to the corresponding unit derived from the
> + *   reference date/time.
> + *
> + *   PARSE_TIME_ROUND_DOWN: All date/time units that are more accurate
> + *   than the most accurate unit specified by 's' are set to the
> + *   smallest valid value for that unit. Rest of the unspecified units
> + *   are set as in PARSE_TIME_NO_ROUND.
> + *
> + *   PARSE_TIME_ROUND_UP: All date/time units that are more accurate
> + *   than the most accurate unit specified by 's' are set to the
> + *   smallest valid value for that unit. The most accurate unit
> + *   specified by 's' is incremented by one (and this is rolled over
> + *   to the less accurate units as necessary), unless the most
> + *   accurate unit is seconds. Rest of the unspecified units are set
> + *   as in PARSE_TIME_NO_ROUND.
> + *
> + *   PARSE_TIME_ROUND_UP_INCLUSIVE: Same as PARSE_TIME_ROUND_UP, minus
> + *   one second, unless the most accurate unit specified by 's' is
> + *   seconds. This is useful for callers that require a value for
> + *   inclusive comparison of the result.
> + *
> + * Return 0 (PARSE_TIME_OK) for succesfully parsed date/time, or one
> + * of PARSE_TIME_ERR_* on error. 't' is not modified on error.
> + */
> +int parse_time_string (const char *s, time_t *t, const time_t *ref, int round);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* PARSE_TIME_STRING_H */

Made it!


More information about the notmuch mailing list