[PATCH v5 2/9] parse-time-string: add a date/time parser to notmuch
Austin Clements
amdragon at MIT.EDU
Thu Oct 25 11:58:16 PDT 2012
Quoth myself on Oct 22 at 4:14 am:
> Overall this looks pretty good to me, and I must say, this parser is
> amazingly flexible and copes well with a remarkably hostile grammar.
>
> A lot of little comments below (sorry if any of this ground has
> already been covered in the previous four versions).
>
> I do have one broad comment. While I'm all for ad hoc parsers for ad
> hoc grammars like dates, there is one piece of the literature I think
> this parser suffers for by ignoring: tokenizing. I think it would
> simplify a lot of this code if it did a tokenizing pass before the
> parsing pass. It doesn't have to be a serious tokenizer with
> streaming and keywords and token types and junk; just something that
> first splits the input into substrings, possibly just non-overlapping
> matches of [[:digit:]]+|[[:alpha:]]+|[-+:/.]. This would simplify the
> handling of postponed numbers because, with trivial lookahead in the
> token stream, you wouldn't have to postpone them. Likewise, it would
> eliminate last_field. It would simplify keyword matching because you
> wouldn't have to worry about matching substrings (I spent a long time
> staring at that code before I figured out what it would and wouldn't
> accept). Most important, I think it would make the parser more
> predictable for users; for example, the parser currently accepts
> things like "saturtoday" because it's aggressively single-pass.
I should add that I am not at all opposed to this patch as it is
currently designed. We need a date parser. My comment about
separating tokenization is just a way that this code could probably be
simplified if someone were so inclined or if simplifying the code
would help it pass any hurdles.
More information about the notmuch
mailing list