[PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch

Thu Oct 25 11:58:16 PDT 2012

Quoth myself on Oct 22 at  4:14 am:
> Overall this looks pretty good to me, and I must say, this parser is
> amazingly flexible and copes well with a remarkably hostile grammar.
> 
> A lot of little comments below (sorry if any of this ground has
> already been covered in the previous four versions).
> 
> I do have one broad comment.  While I'm all for ad hoc parsers for ad
> hoc grammars like dates, there is one piece of the literature I think
> this parser suffers for by ignoring: tokenizing.  I think it would
> simplify a lot of this code if it did a tokenizing pass before the
> parsing pass.  It doesn't have to be a serious tokenizer with
> streaming and keywords and token types and junk; just something that
> first splits the input into substrings, possibly just non-overlapping
> matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.].  This would simplify the
> handling of postponed numbers because, with trivial lookahead in the
> token stream, you wouldn't have to postpone them.  Likewise, it would
> eliminate last_field.  It would simplify keyword matching because you
> wouldn't have to worry about matching substrings (I spent a long time
> staring at that code before I figured out what it would and wouldn't
> accept).  Most important, I think it would make the parser more
> predictable for users; for example, the parser currently accepts
> things like "saturtoday" because it's aggressively single-pass.

I should add that I am not at all opposed to this patch as it is
currently designed.  We need a date parser.  My comment about
separating tokenization is just a way that this code could probably be
simplified if someone were so inclined or if simplifying the code
would help it pass any hurdles.