[PATCH v2 2/3] emacs: Improve the regexp used to match id:'s in messages

Austin Clements amdragon at MIT.EDU
Sat Nov 10 20:46:14 PST 2012


This regexp agrees with Xapian query syntax much more closely, though
we specifically disallow various cases that would be confusing in the
context of an email body (e.g., punctuation at the end of an id: link
is not considered part of the id: link because it's probably part of
the surrounding text).

In particular, this handles id: links that are not surrounded by
quotes much better, which stash is much more likely to generate now
that we don't quote id's that don't need to be quoted.  It also
handles quoted id: links better.

We update the buttonization test to reflect the new pattern.
---
 emacs/notmuch-show.el |   20 +++++++++++++++++++-
 test/emacs-show       |   20 ++++++++++----------
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index d061367..49961fb 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -996,6 +996,24 @@ message at DEPTH in the current thread."
   "Insert the forest of threads FOREST."
   (mapc (lambda (thread) (notmuch-show-insert-thread thread 0)) forest))
 
+(defvar notmuch-id-regexp
+  (concat
+   ;; Match the id: prefix only if it begins a word (to disallow, for
+   ;; example, matching cid:).
+   "\\<id:\\("
+   ;; If the term starts with a ", then parse Xapian's quoted boolean
+   ;; term syntax, which allows for anything as long as embedded
+   ;; double quotes escaped by doubling them.  We also disallow
+   ;; newlines (which Xapian allows) to prevent runaway terms.
+   "\"\\([^\"\n]\\|\"\"\\)*\""
+   ;; Otherwise, parse Xapian's unquoted syntax, which goes up to the
+   ;; next space or ).  We disallow [.,;] as the last character
+   ;; because these are probably part of the surrounding text, and not
+   ;; part of the id.  This doesn't match single character ids; meh.
+   "\\|[^\"[:space:])][^[:space:])]*[^])[:space:].,:;?!]"
+   "\\)")
+  "The regexp used to match id: links in messages.")
+
 (defun notmuch-show-buttonise-links (start end)
   "Buttonise URLs and mail addresses between START and END.
 
@@ -1004,7 +1022,7 @@ a corresponding notmuch search."
   (goto-address-fontify-region start end)
   (save-excursion
     (goto-char start)
-    (while (re-search-forward "id:\\(\"?\\)[^[:space:]\"]+\\1" end t)
+    (while (re-search-forward notmuch-id-regexp end t)
       ;; remove the overlay created by goto-address-mode
       (remove-overlays (match-beginning 0) (match-end 0) 'goto-address t)
       (make-text-button (match-beginning 0) (match-end 0)
diff --git a/test/emacs-show b/test/emacs-show
index 9712633..8944e73 100755
--- a/test/emacs-show
+++ b/test/emacs-show
@@ -136,23 +136,23 @@ To: Notmuch Test Suite <test_suite at notmuchmail.org>
 Date: Fri, 05 Jan 2001 15:43:57 +0000
 
 <<id:abc>>
-<<id:abc.def.>> <<id:abc,def,>> <<id:abc;def;>> <<id:abc:def:>>
-<<id:foo at bar.?baz?>> <<id:foo at bar!.baz!>>
-(<<id:foo at bar.baz)>> [<<id:foo at bar.baz]>>
-<<id:foo at bar.baz...>>
+<<id:abc.def>>. <<id:abc,def>>, <<id:abc;def>>; <<id:abc:def>>:
+<<id:foo at bar.?baz>>? <<id:foo at bar!.baz>>!
+(<<id:foo at bar.baz>>) [<<id:foo at bar.baz>>]
+<<id:foo at bar.baz>>...
 <<id:2+2=5>>
 <<id:=_-:/.[]@$%+>>
-<<id:abc)def>>
-<<id:ab>>"c def
+<<id:abc>>)def
+<<id:ab"c>> def
 <<id:"abc">>def
-<<id:"ab">>"c"def
-id:"ab c"def
+<<id:"ab""c">>def
+<<id:"ab c">>def
 <<id:"abc">>.def
 id:"abc
 "
-<<id:)>>
+id:)
 id:
-c<<id:xxx>>
+cid:xxx
 EOF
 test_expect_equal_file EXPECTED OUTPUT
 
-- 
1.7.10.4



More information about the notmuch mailing list