Austin's custom query parser: folder/directory searching, some numbers

Thomas Schwinge thomas at schwinge.name
Thu Oct 27 04:12:46 PDT 2011


Hi!

As I already told on IRC (and which I still have to polish and
publish...), I recently merged Austin's custom query parser into my local
tree, mainly (for now) for its exact folder/directory searching
capabilities.

Austin had published this work several months ago, and Carl in the mean
time had implemented his own folder: searches.  Now, there was a conflict
about which to use; they have different semantics, Carl's being
inadequate for my use case (not rooted, for example).  On IRC, Carl
recently had the most pragmatic solution for how to approach this: if we
can't agree on having either his folder: semantics, or Austin's strict
filename matching -- then just have both of them.  So I now have arranged
for having both Carl's folder: (with it's ``weak'' mail folder
semantics), and also Austin's directory: (with it's ``hard''
directory/filename matching semantics), and on top of the latter
implemented rdirectory: which extends directory: by recursive matching.
This works really nice.


IRC, freenode, #notmuch, 2011-09-30:

    <amdragon> tschwinge: Before you get in too deep I should point out
      that there's a (not unsurmountable) flaw in the folder handling.
      Because it expands to all of the desired dir-entry terms, it can
      chew up a huge amount of memory (~50K per matched file, IIRC).

After importing several GNU mailing lists' archives yesterday, I now did
some measurements, and it is in the 20s KiB per file, ranging from 26 KiB
for a 9000 files hierarchy to 21 KiB for a 23000 files hierarchy (the
reason for the non-linearity mostly being notmuch's regular resident
size, etc., I assume).

And, of course:

    $ find ~/Mail-schwinge.name-thomas/import/GNU/2011-04-03/ -type f | wc -l
    276010
    $ notmuch search --output=files -- rdirectory:import/GNU/2011-04-03 | grep -F import/GNU/2011-04-03 | wc -l
    0
    $ echo "${PIPESTATUS[@]}"
    137 1 0
    $ dmesg | grep notmuch
    [3797089.224252] notmuch invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0, oom_score_adj=0
    [3797089.224282] notmuch cpuset=/ mems_allowed=0
    [3797089.224290] Pid: 586, comm: notmuch Not tainted 3.0.0-1-686-pae #1
    [3797089.232081] [  586]  1000   586   310693   257874   0       0             0 notmuch
    [3797089.232081] Out of memory: Kill process 586 (notmuch) score 697 or sacrifice child
    [3797089.232081] Killed process 586 (notmuch) total-vm:1242772kB, anon-rss:1031492kB, file-rss:4kB

:-) (But this is no problem for me; I don't need to do such
coarse-grained matching.)

    <amdragon> tschwinge: The solution is probably to add folder terms to
      messages (but as one, unsplit term, unlike in cworth's approach)
      and expand on those so that the space is bounded by the number of
      matched folders, rather than files.  That would also make it quite
      easy to do arbitrary glob matching.

(These would now be directory terms.)  This suggestion still stands.
(But I'm not working on it at the moment.)


Grüße,
 Thomas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: <http://notmuchmail.org/pipermail/notmuch/attachments/20111027/b8dc5436/attachment.pgp>


More information about the notmuch mailing list