[PATCH v3 2/9] parse-time-string: add a date/time parser to notmuch

Michal Nazarewicz mina86 at mina86.com
Mon Sep 17 07:13:35 PDT 2012


> On Thu, 13 Sep 2012, Michal Nazarewicz <mina86 at mina86.com> wrote:
>> Have you consider doing the same in bison?  I consider the code totally
>> unreadable and unmaintainable.

On Thu, Sep 13 2012, Jani Nikula wrote:
> I do not think you could easily do everything that this parser does in
> bison. But then I'm not an expert in bison, and I have zero ambition to
> become one. So I'm biased, and I'm open about it.

Bison can do a lot of weird stuff including modifying how lexer
interpretes tokens even while parsing given grammar rule.

> Even so, if you're suggesting doing this in bison would make this
> totally readable and maintainable, I urge you to have a good look at
> [1]. Note that it also does less in more lines of code. (And using it
> as-is in notmuch has pretty much been turned down in the past.)
>
> Finally, I also suggest you actually read and review the code, pointing
> out concrete issues in readability or maintainability that you
> see. Especially since an earlier version has received comment "[I]t
> looks very nice to me. It is pleasantly nice to read." [2]. What you're
> doing is worthless bikeshedding otherwise.

I'm sorry.  I sometime tend to go into extremes with my statements, so
yes, the “totally unreadable” was a over statement on my part.

My point was however that parsing is a solved problem, and for
non-trivial parsers one needs to ask herself whether it's worth trying
to implement the logic, or maybe using a parser generator is just
simpler.

And in this particular case, my feeling is that bison is easier to read
and modify.

To add some merit to my statement, I attach a bison parser.

It supports ranges as so:
	<date>		the specific moment with duration dependent
			on specification.  How duration is figured out
			is described in the next paragraph.
	<from>..<to>	dates >= <from> and < <to>, so for instance
			“yesterday..0” days yields results from yesterday.
	..<to>		dates < <to>
	<from>..	dates >= <from>
	<from>++<dur>	a shorthand of “<from>..<from> + <dur>”.
			This is useful for things like: “2012q1++2
			quarters” which is equivalent to
			“2012/01/01..2012/07/01”, ie. the first two
			quarters of 2012.

It supports specifications as:
	'@' <num>
		Raw timestamp.  It's duration is one second.

	<num> (seconds | minutes | hours | days | weeks | fortnights) [ago]
		moves the date by given number of units in the future or
		in the past (if “ago” is given).  <num> can be preceded
		by sign.

		This specification's duration is whatever unit was used,
		ie. one second, one minute, one hour, one day, one week
		or one fortnight.  So “7 days ago” and “1 week ago”
		specify the same moment, but they hay different
		durations.

	<num> (months | quarters | years) [ago]
		Like above, but calendar months are used which do not
		always have the same length.  If applying the offset
		ends up with a day of the month out of range, the day
		is capped to the last day of the month.

	yesterday
		Moves one day back. [*]  Note that because of [*] this
		is not quivalent to “-1day”.
	YYYY/MM/DD
	YYYY-MM-DD
	MM-DD-YYYY
	DD Month YYYY
	Month DD YYYY
		Sets date accordingly. [*] “Month” is a human readable
		month name.
	Month [DD] [YYYY]
		If either day or year is missing, given component of the
		date is not changed. [*]  Also, if day is missing, the
		duration is set to one month rather than one day (but
		see caveats described in [*]).
	YYYY q Q
		Sets date to the beginning of quarter Q, ie. “2012q2” is
		roughly the same as “2012/04/01”. [*] Sets duration to
		three moths but see caveats described in [*].

	midnight | noon
		Sets time to 0:00:00 and 12:00:00 respectively.   Has
		duration of 1 hour.
	HH:MM:SS [am | pm]
	HH:MM    [am | pm]
	HH       (am | pm)
		Sets time accordingly with the part that is not
		specified set to zero.  Duration depends on how many
		components are missing, ie. “HH (am|pm)” has a duration of
		on hour, “HH:MM” has a duration of one minute and
		“HH:MM:SS” has a duration of one second.

[*] Formats specifying the date will zero the time to midnight unless
    the time has already been specified (ie. “yesterday” is roughly the
    same as “yesterday midnight”, but “noon yesterday” still keeps time
    as noon.

    Also, if the time has not been specified, those formats will set
    duration to one day (with two exception), so “yesterday” has
    a duration of one day, but “yesterday midnight”, even though it
    specifies the same moment's beginning, has a duration of one hour.

Purposly, I have not added support for MM/DD/YYYY or DD/MM/YYYY as well
as two-digit years.  I feel this would only add confusion.

---
 .gitignore            |    3 +
 Makefile              |   17 ++
 date-parser-grammar.y |  173 ++++++++++++++++++
 date-parser.c         |  476 +++++++++++++++++++++++++++++++++++++++++++++++++
 date-parser.h         |   59 ++++++
 test.c                |   44 +++++
 6 files changed, 772 insertions(+), 0 deletions(-)
 create mode 100644 .gitignore
 create mode 100644 Makefile
 create mode 100644 date-parser-grammar.y
 create mode 100644 date-parser.c
 create mode 100644 date-parser.h
 create mode 100644 test.c

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..b73c782
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,3 @@
+test
+*.o
+date-parser-grammar.tab.*
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..8d95f71
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,17 @@
+CFLAGS += -std=c99 -Wextra -Werror -pedantic
+
+test: test.o date-parser.o date-parser-grammar.tab.o
+test.o: test.c date-parser.h
+date-parser.o: date-parser.c date-parser.h
+date-parser.o: date-parser-grammar.tab.h
+
+date-parser-grammar.tab.c: date-parser-grammar.y
+	bison $<
+
+date-parser-grammar.tab.h: date-parser-grammar.tab.c
+date-parser-grammar.tab.o: date-parser-grammar.tab.c date-parser.h
+date-parser-grammar.tab.o: CPPFLAGS += -Wno-unreachable-code
+
+clean:
+	rm -f date-parser-grammar.output date-parser-grammar.tab.* \
+		*.o test
diff --git a/date-parser-grammar.y b/date-parser-grammar.y
new file mode 100644
index 0000000..38ddfca
--- /dev/null
+++ b/date-parser-grammar.y
@@ -0,0 +1,173 @@
+/* Date parser bison grammar file
+ * Copyright (c) 2012 Google Inc.
+ * Written by Michal Nazarewicz <mina86 at mina86.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ . */
+
+%code requires {
+
+#ifndef YYSTYPE
+#  define YYSTYPE long
+#endif
+#ifndef YYLTYPE
+# define YYLTYPE struct yylocation
+#endif
+
+#ifdef YYLLOC_DEFAULT
+#  undef YYLLOC_DEFAULT
+#endif
+#define YYLLOC_DEFAULT(Cur, Rhs, N) do {		\
+	if (N) {					\
+		(Cur).start = YYRHSLOC(Rhs, 1).start;	\
+		(Cur).end   = YYRHSLOC(Rhs, N).end;	\
+	} else {					\
+		(Cur) = YYRHSLOC(Rhs, 0);		\
+	}						\
+} while (0)
+}
+
+%code{
+#include "date-parser-grammar.tab.h"
+#include "date-parser.h"
+
+#define ASSERT(cond, loc, message) do {			\
+	if (!(cond)) {					\
+		parse_date_print_error(&loc, message);	\
+		YYERROR;				\
+	}						\
+} while (0)
+}
+
+%locations
+%defines
+%error-verbose
+%define		api.pure
+
+%parse-param	{struct date *ret}
+%parse-param	{const char **inputp}
+%lex-param	{const char **inputp}
+
+%token	T_NUMBER	"<num>"		/* Always positive. */
+%token	T_NUMBER_4	"####"		/* Four digit number. */
+%token	T_AGO		"ago"
+/* Also used for minutes, hours, days and weeks. */
+%token	T_SECONDS	"seconds"
+/* Also used for quarters and years */
+%token	T_MONTHS	"months"
+%token	T_YESTERDAY	"yesterday"
+%token	T_AMPM		"am/pm"
+%token	T_HOUR		"<hour>"
+%token	T_MONTH		"<month>"
+
+%expect	3	/* Two shift/reduce conflicts caused by year_maybe, and onde
+		 * caused by day_maybe. */
+
+%%
+	/* For backwards compatibility, just a number and nothing else
+	 * is treated as timestamp */
+input	: number { date_set_from_stamp(ret, $1) }
+	| date
+	;
+
+date	: part
+	| date part
+	;
+
+part	: integer "seconds" ago_maybe {
+		ASSERT(date_add_seconds(ret, $1 * $3, $2), @$,
+		       "offset ends up in date out of range")
+	}
+	| integer "months"  ago_maybe {
+		ASSERT(date_add_months(ret, $1 * $3, $2), @$,
+		       "offset ends up in date out of range")
+	}
+	| "yesterday" {
+		ASSERT(date_set_yesterday(ret), @$,
+		       "offset ends up in date out of range")
+	}
+
+	| '@' number			{ date_set_from_stamp(ret, $2) }
+	| "<hour>"			{ date_set_time(ret, $1, -1, -1, -1) }
+
+	/* HH:MM, HH:MM:SS, HH:MM am/pm, HH:MM:SS am/pm */
+	| number ':' number seconds_maybe ampm_maybe {
+		ASSERT(date_set_time(ret, $1, $3, $4, $5), @$,
+		       "invalid time")
+	}
+
+	| number "am/pm" {			/* HH am/pm */
+		ASSERT(date_set_time(ret, $1, -1, -1, $2), @$, "invalid hour")
+	}
+
+	| "####" '/' "<num>" '/' "<num>" {	/* YYYY/MM/DD */
+		ASSERT(date_set_date(ret, $1, $3, $5), @$, "invalid date")
+	}
+	| "####" '-' "<num>" '-' "<num>" {	/* YYYY-MM-DD */
+		ASSERT(date_set_date(ret, $1, $3, $5), @$, "invalid date")
+	}
+	| "<num>" '-' "<num>" '-' "####" {	/* DD-MM-YYYY */
+		ASSERT(date_set_date(ret, $5, $3, $1), @$, "invalid date")
+	}
+	/* No MM/DD/YYYY or DD/MM/YYYY because it's confusing. */
+
+	| "<num>" "<month>" year_maybe {	/* 1 January 2012 */
+		ASSERT(date_set_date(ret, $3, $2, $1), @$, "invalid date")
+	}
+	| "<month>" day_maybe year_maybe {	/* January 1 2012 */
+		ASSERT(date_set_date(ret, $3, $1, $2), @$, "invalid date")
+	}
+
+	| "####" 'q' "<num>" {			/* Quarter, 2012q1 */
+		ASSERT(date_set_quarter(ret, $1, $3), @$, "invalid quarter");
+	}
+	;
+
+number	: "<num>"	{ $$ = $1 }
+	| "####"	{ $$ = $1 }
+	;
+
+integer	:     number	{ $$ =  $1 }
+	| '-' number	{ $$ = -$2 }
+	;
+
+ago_maybe
+	: /* empty */	{ $$ =  1 }
+	| "ago"		{ $$ = -1 }
+	;
+
+seconds_maybe
+	: /* empty */	{ $$ = -1 }
+	| ':' "<num>"	{ $$ = $2 }
+	;
+
+ampm_maybe
+	: /* empty */	{ $$ = -1 }
+	| "am/pm"	{ $$ = $$ }
+	/* For people who like writing "a.m." or "p.m." and since dot
+	 * is ignored by the lexer (ie. it's treated just like white
+	 * space), dot is lost. */
+	| 'a' 'm'	{ $$ =  0 }
+	| 'p' 'm'	{ $$ =  0 }
+	;
+
+day_maybe
+	: /* empty */	{ $$ = -1 }
+	| "<num>"	{ $$ = $1 }
+	;
+
+year_maybe
+	: /* empty */	{ $$ = -1 }
+	| "####"	{ $$ = $1 }
+	;
+%%
diff --git a/date-parser.c b/date-parser.c
new file mode 100644
index 0000000..c1701bd
--- /dev/null
+++ b/date-parser.c
@@ -0,0 +1,476 @@
+/* Date parser implementation file.
+ * Copyright (c) 2012 Google Inc.
+ * Written by Michal Nazarewicz <mina86 at mina86.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ . */
+
+#define _POSIX_C_SOURCE 1
+
+#include "date-parser.h"
+#include "date-parser-grammar.tab.h"
+
+#include <ctype.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+#include <strings.h>
+
+
+static bool is_valid_year(int year) {
+	/* TODO: Get the actual time_t range. */
+	return year >= 1970 && year < 2037;
+}
+
+
+/***************************** Basic date helpers ***************************/
+
+static int days_in_months[] = {
+	31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
+};
+
+static inline int is_leap(int year) {
+	return year % 4 == 0 && (year % 100 || year % 400 == 0);
+}
+
+static inline int days_in_month(int year, int month) {
+	return days_in_months[month] + (month == 1 ? is_leap(year) : 0);
+}
+
+static inline int days_in_year(int year) {
+	return 365 + is_leap(year);
+}
+
+static inline int min(int a, int b) {
+	return a < b ? a : b;
+}
+
+
+/****************************** Date manipulation ***************************/
+
+struct date {
+	struct tm tm;
+	int dur_sec, dur_mon, has_time;
+};
+
+
+void date_set_from_stamp(struct date *ret, long stamp) {
+	time_t t = stamp;
+	localtime_r(&t, &ret->tm);
+	ret->dur_sec = 1;
+	ret->dur_mon = 0;
+	ret->has_time = 1;
+}
+
+static void date_set_to_now(struct date *ret) {
+	time_t t = time(NULL);
+	localtime_r(&t, &ret->tm);
+	ret->dur_sec = 1;
+	ret->dur_mon = 0;
+	ret->has_time = 0;
+}
+
+static void date_zero_time(struct date *ret, int dur_sec, int dur_mon) {
+	if (!ret->has_time) {
+		ret->tm.tm_hour = ret->tm.tm_min = ret->tm.tm_sec = 0;
+		ret->dur_sec = dur_sec;
+		ret->dur_mon = dur_mon;
+	}
+}
+
+bool date_add_seconds(struct date *ret, long num, long unit) {
+	time_t t = mktime(&ret->tm) + num * unit;
+	localtime_r(&t, &ret->tm);
+	ret->dur_sec = unit;
+	ret->dur_mon = 0;
+	return true; /* TODO add validation */
+}
+
+bool date_set_yesterday(struct date *ret) {
+	if (ret->tm.tm_mday != 1) {
+		--ret->tm.tm_mday;
+	} else if (ret->tm.tm_mon) {
+		--ret->tm.tm_mon;
+		ret->tm.tm_mday = days_in_month(ret->tm.tm_year + 1900,
+						ret->tm.tm_mon);
+	} else if (is_valid_year(1900 + ret->tm.tm_year - 1)) {
+		--ret->tm.tm_year;
+		ret->tm.tm_mon = 11;
+		ret->tm.tm_mday = 31;
+	} else {
+		return false;
+	}
+	date_zero_time(ret, 24 * 3600, 0);
+	ret->tm.tm_isdst = -1;
+	return true;
+}
+
+bool date_add_months(struct date *ret, long num, long unit) {
+	long y;
+
+	y = ret->tm.tm_year + 1900;
+	num = num * unit + ret->tm.tm_mon;
+	if (num < 0) {
+		y -= -num / 12;
+		num = 11 - (-num % 12);
+	} else {
+		y += num / 12;
+		num %= 12;
+	}
+	if (!is_valid_year(y)) {
+		return false;
+	}
+
+	ret->tm.tm_year = y - 1900;
+	ret->tm.tm_mon = num;
+	ret->tm.tm_mday = min(ret->tm.tm_mday,
+			      days_in_month(ret->tm.tm_year + 1900,
+					    ret->tm.tm_mon));
+	if (!ret->has_time) {
+		ret->dur_sec = 0;
+		ret->dur_mon = unit;
+	}
+	ret->tm.tm_isdst = -1;
+	return true;
+}
+
+bool date_set_time(struct date *ret, long h, long m, long s, int ampm) {
+	if (m > 60 || s > 60 || h > 23) {
+		return false;
+	}
+
+	if (ampm != -1) {
+		if (!h || h > 12) {
+			return false;
+		}
+		if (h != 12) {
+			h += ampm * 12;
+		} else if (ampm) {  /* 12 pm */
+			h = 12;
+		} else {
+			/* 12 am is 0 the next day, so adjust date */
+			date_add_seconds(ret, 1, 24 * 3600);
+			h = 0;
+		}
+	}
+
+	if (m == -1) {
+		ret->dur_sec = 3600;
+		m = s = 0;
+	} else if (s == -1) {
+		ret->dur_sec = 60;
+		s = 0;
+	} else {
+		ret->dur_sec = 1;
+	}
+	ret->dur_mon = 0;
+
+	ret->tm.tm_hour = h;
+	ret->tm.tm_min = m;
+	ret->tm.tm_sec = s;
+	ret->tm.tm_isdst = -1;
+
+	ret->has_time = 1;
+	return true;
+}
+
+bool date_set_date(struct date *ret, long y, long m, long d) {
+	if (y == -1) {
+		y = ret->tm.tm_year + 1900;
+	} else if (!is_valid_year(y)) {
+		return false;
+	}
+	if (m < 1 || m > 12 ||
+	    (d != -1 && (d < 1 || d > days_in_month(y, m)))) {
+		return false;
+	}
+	ret->tm.tm_year = y - 1900;
+	ret->tm.tm_mon = m - 1;
+	ret->tm.tm_mday = d == -1 ? 1 : d;
+	if (d == -1) {
+		date_zero_time(ret, 0, 1);
+	} else {
+		date_zero_time(ret, 24 * 3600, 0);
+	}
+	ret->tm.tm_isdst = -1;
+	return true;
+}
+
+bool date_set_quarter(struct date *ret, long y, long q) {
+	if (!is_valid_year(y) || q < 1 || q > 4) {
+		return false;
+	}
+	ret->tm.tm_year = y - 1900;
+	ret->tm.tm_mon = (q - 1) * 3;
+	ret->tm.tm_mday = 1;
+	date_zero_time(ret, 0, 3);
+	ret->tm.tm_isdst = -1;
+	return true;
+}
+
+
+#define TOKEN(str, token, num) { str, sizeof(str) - 1, token, num }
+#define ABBR(len) { NULL, len, 0, 0 }
+
+static struct token {
+	const char *str;
+	size_t len;
+	int token;
+	long num;
+} tokens_array[] = {
+	TOKEN("ago", T_AGO, 0),
+
+	TOKEN("am", T_AMPM, 0),
+	TOKEN("pm", T_AMPM, 1),
+
+	TOKEN("seconds",    T_SECONDS,              1), ABBR(3), ABBR(6),
+	TOKEN("minutes",    T_SECONDS,             60), ABBR(3), ABBR(6),
+	TOKEN("hours",      T_SECONDS,           3600), ABBR(4), ABBR(1),
+	TOKEN("days",       T_SECONDS,      24 * 3600), ABBR(3), ABBR(1),
+	TOKEN("weeks",      T_SECONDS,  7 * 24 * 3600), ABBR(4),
+	TOKEN("fortnights", T_SECONDS, 14 * 24 * 3600), ABBR(9),
+
+	TOKEN("months",     T_MONTHS,  1), ABBR(5),
+	TOKEN("quarters",   T_MONTHS,  3), ABBR(7),
+	TOKEN("years",      T_MONTHS, 12), ABBR(4),
+
+	TOKEN("yesterday",  T_YESTERDAY, 0),
+
+	TOKEN("midnight",   T_HOUR,  0),
+	TOKEN("noon",       T_HOUR, 12),
+
+	TOKEN("january",    T_MONTH,  1), ABBR(3),
+	TOKEN("february",   T_MONTH,  2), ABBR(3),
+	TOKEN("march",      T_MONTH,  3), ABBR(3),
+	TOKEN("april",      T_MONTH,  4), ABBR(3),
+	TOKEN("may",        T_MONTH,  5),
+	TOKEN("june",       T_MONTH,  6), ABBR(3),
+	TOKEN("july",       T_MONTH,  7), ABBR(3),
+	TOKEN("august",     T_MONTH,  8), ABBR(3),
+	TOKEN("september",  T_MONTH,  9), ABBR(4), ABBR(3),
+	TOKEN("october",    T_MONTH, 10), ABBR(3),
+	TOKEN("november",   T_MONTH, 11), ABBR(3),
+	TOKEN("december",   T_MONTH, 12), ABBR(3),
+
+	{ NULL, 0, 0, 0 },
+};
+
+#undef TOKEN
+#undef ABBR
+
+
+static struct token locale_tokens_array[2*12 + 1];
+static bool locale_tokens_populated = false;
+
+static void populate_locale_tokens(void) {
+	static const char *mon_formats[] = { "%b", "%B" };
+	static char locale_buffer[1024];
+
+	char *buf = locale_buffer, *end = buf + sizeof(locale_buffer);
+	struct token *out = locale_tokens_array;
+	struct tm tm;
+	unsigned i;
+
+	tm.tm_sec = 0;
+	tm.tm_min = 0;
+	tm.tm_hour = 0;
+	tm.tm_mday = 10;
+	tm.tm_year = 100;
+	tm.tm_isdst = 0;
+
+	for (tm.tm_mon = 0; tm.tm_mon < 12; ++tm.tm_mon) {
+		for (i = 0; i < 2; ++i) {
+			out->len = strftime(buf, end - buf,
+					    mon_formats[i], &tm);
+			if (!out->len) {
+				continue;
+			}
+			out->str = buf;
+			buf += out->len;
+			out->token = T_MONTH;
+			out->num = tm.tm_mon + 1;
+			++out;
+		}
+	}
+
+	out->len = 0;
+}
+
+static const struct token *find_token(const struct token *tk,
+				      const char *str, size_t len) {
+	const struct token *ret;
+
+	for (; tk->len; ++tk) {
+		if (tk->str) {
+			ret = tk;
+		}
+		if (tk->len == len && !strncasecmp(str, ret->str, len)) {
+			return ret;
+		}
+	}
+
+	return NULL;
+}
+
+
+/* Treat '_' and '.' as white space so that people don't have to quote
+ * the argument when specifying it on command line. */
+#define SKIP_WHITE_SPACE(ch) do {				\
+	while (isspace(*ch) || *ch == '_' || *ch == '.') {	\
+		++ch;						\
+	}							\
+} while (0)
+
+
+int yylex(YYSTYPE *valp, struct yylocation *loc, const char **inputp) {
+	const char *ch = *inputp, *str;
+	const struct token *tk;
+
+	SKIP_WHITE_SPACE(ch);
+
+	/* End of data */
+	if (*ch == 0) {
+		return EOF;
+	}
+
+	loc->start = ch;
+
+	/* Parse number */
+	if (isdigit(*ch)) {
+		errno = 0;
+		*valp = strtol(ch, (char**)&ch, 10);
+
+		loc->end = ch;
+		*inputp = ch;
+
+		if (errno) {
+			parse_date_print_error(loc, "number out of range");
+			return 256;
+		}
+		return ch - loc->start == 4 ? T_NUMBER_4 : T_NUMBER;
+	}
+
+	if (!isalpha(*ch)) {
+		*inputp = ch + 1;
+		loc->end = ch + 1;
+		return *ch;
+	}
+
+	/* So it's a string token. */
+	str = ch;
+	while (isalpha(*ch)) {
+		++ch;
+	}
+	loc->end = ch;
+	*inputp = ch;
+
+	tk = find_token(tokens_array, str, ch - str);
+	if (tk) {
+		*valp = tk->num;
+		return tk->token;
+	}
+
+	/* Let's try with locale strings. */
+	if (!locale_tokens_populated) {
+		populate_locale_tokens();
+		locale_tokens_populated = true;
+	}
+	tk = find_token(locale_tokens_array, str, ch - str);
+	if (tk) {
+		*valp = tk->num;
+		return tk->token;
+	}
+
+	/* If it's just one letter, return it converted to lower case */
+	if (ch - str == 1) {
+		return tolower(*str);
+	}
+
+	parse_date_print_error(loc, "unrecognised token");
+	return 256;
+}
+
+
+/**************************** Parsing interface *****************************/
+
+static bool parse_date(struct date *ret, const char *from, char *to) {
+	char tmp;
+	int res;
+	if (to) {
+		tmp = *to;
+		*to = '\0';
+	}
+	res = yyparse(ret, &from);
+	if (to) {
+		*to = tmp;
+	}
+	return res == 0;
+}
+
+bool parse_range(char *arg, time_t *from, time_t *to) {
+	bool left = false, right = false;
+	struct date a, b;
+	char *dd, *pp;
+
+	SKIP_WHITE_SPACE(arg);
+	if (!*arg) {
+		fprintf(stderr, "empty range argument\n");
+		return false;
+	}
+
+	dd = strstr(arg, "..");
+	pp = strstr(arg, "++");
+	if ((dd && pp) ||
+	    (dd && strstr(dd + 2, "..")) ||
+	    (pp && strstr(pp + 2, "++"))) {
+		fprintf(stderr,
+			"%s: at most one of '..' or '++' can be used\n", arg);
+		return false;
+	}
+
+	if (dd || pp) {
+		char *ch = dd ? dd : pp;
+		left = ch != arg;
+		SKIP_WHITE_SPACE(ch);
+		right = *ch;
+	}
+	if (pp && (!right || !left)) {
+		fprintf(stderr,
+			"%s: '++' requires expression on both sides\n", arg);
+		return false;
+	}
+
+	if (left || !right) {  /* date:<date>.. or date:<date> */
+		date_set_to_now(&a);
+		if (!parse_date(&a, arg, dd ? dd : pp)) {
+			return false;
+		}
+	}
+	if (right) {  /* date:<date>..<date> or date:..<date> */
+		if (pp) {
+			b = a;
+		} else {
+			date_set_to_now(&b);
+		}
+		if (!parse_date(&b, (dd ? dd : pp) + 2, NULL)) {
+			return false;
+		}
+	} else if (!left) {  /* date:date */
+		left = right = true;  /* convert to date:<date>..<date> */
+		b = a;
+		if ((b.dur_sec && !date_add_seconds(&b, b.dur_sec, 1)) ||
+		    (b.dur_mon && !date_add_months(&b, b.dur_mon, 1))) {
+			right = false;  /* convert to date:<date>.. */
+		}
+	}
+
+	*from = left ? mktime(&a.tm) : 0;
+	*to = right ? mktime(&b.tm) : (time_t)((unsigned long)~(time_t)0 >> 1);
+	return true;
+}
diff --git a/date-parser.h b/date-parser.h
new file mode 100644
index 0000000..fb0f19b
--- /dev/null
+++ b/date-parser.h
@@ -0,0 +1,59 @@
+/* Date parser header file.
+ * Copyright (c) 2012 Google Inc.
+ * Written by Michal Nazarewicz <mina86 at mina86.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ . */
+
+#ifndef H_DATE_PARSER_H
+#define H_DATE_PARSER_H
+
+#include <stdbool.h>
+#include <stdio.h>
+#include <time.h>
+
+bool parse_range(char *arg, time_t *from, time_t *to);
+
+/* For parser */
+struct date;
+
+struct yylocation {
+	const char *start, *end;
+};
+
+static inline void parse_date_print_error(struct yylocation *loc,
+					  const char *message) {
+	fprintf(stderr, "%.*s: %s\n",
+		(int)(loc->end - loc->start), loc->start, message);
+}
+
+static inline int yyerror(struct yylocation *loc, struct date *ret,
+			  const char **inputp, const char *message) {
+	ret = ret; /* make compiler happy */
+	inputp = inputp;
+	parse_date_print_error(loc, message);
+	return 0;
+}
+
+int yylex(long *valp, struct yylocation *loc, const char **inputp);
+int yyparse(struct date *ret, const char **inputp);
+
+void date_set_from_stamp(struct date *ret, long stamp);
+bool date_add_seconds(struct date *ret, long num, long unit);
+bool date_set_yesterday(struct date *ret);
+bool date_add_months(struct date *ret, long num, long unit);
+bool date_set_time(struct date *ret, long h, long m, long s, int ampm);
+bool date_set_date(struct date *ret, long y, long m, long d);
+bool date_set_quarter(struct date *ret, long y, long q);
+
+#endif
diff --git a/test.c b/test.c
new file mode 100644
index 0000000..c4e2d9c
--- /dev/null
+++ b/test.c
@@ -0,0 +1,44 @@
+/* Date parser testing application.
+ * Copyright (c) 2012 Google Inc.
+ * Written by Michal Nazarewicz <mina86 at mina86.com>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see http://www.gnu.org/licenses/ . */
+
+#define _POSIX_C_SOURCE 1
+
+#include <locale.h>
+#include <stdio.h>
+#include <time.h>
+
+#include "date-parser.h"
+
+int main(void) {
+	char buf[1024], *ch;
+	time_t from, to;
+	struct tm tm;
+
+	setlocale(LC_ALL, "");
+
+	while (fgets(buf, sizeof buf, stdin)) {
+		if (parse_range(buf, &from, &to)) {
+			localtime_r(&from, &tm);
+			ch = buf + strftime(buf, sizeof buf / 2,
+					    "[%Y/%m/%d %H:%M:%S %Z, ", &tm);
+			localtime_r(&to, &tm);
+			strftime(ch, sizeof buf / 2,
+				 "%Y/%m/%d %H:%M:%S %Z)\n", &tm);
+			fputs(buf, stdout);
+		}
+	}
+}
-- 
1.7.7.3

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn at google.com>--------------ooO--(_)--Ooo--
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20120917/77b648e9/attachment-0001.pgp>


More information about the notmuch mailing list