Xapian commits unexpectedly slow
Matthew Schauer
matthew.schauer at e10x.net
Thu Jan 2 14:46:04 PST 2020
On 12/28/19 5:21 PM, David Bremner wrote:
> Matthew Schauer <matthew.schauer at e10x.net> writes:
>
>> Greetings,
>>
>> I've been trying to migrate about 25K e-mails to Notmuch, and I'm seeing
>> some frustrating performance characteristics that don't seem to match
>> with the experience others report.
>
> 25,000 messages should really not be a strain, spinning rust or no.
>
>> I'm dumping messages from
>> Thunderbird in batches and then running `notmuch new` to add each batch
>> to the database. The indexing performance remains okay, at more than
>> 200 per second, but after Notmuch has reported it's finished indexing,
>> it hangs for as much as several minutes before exiting. A stack trace
>> confirms that this is Xapian committing the database, with most of the
>> time seemingly spent in `fdatasync`. The time spent grows with the size
>> of the database, not the number of e-mails being imported, which means
>> this will remain a problem during day-to-day usage.
>
> It would be interesting if you could report the results of running the
> notmuch performance test suite (under performance-test/ in the source).
Nifty! Here are the results -- I assume you know how to interpret them
better than I do:
T00-new.sh: Testing notmuch new [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
Initial notmuch new 1163.05 854.26 45.97 444304 2343120/13645200
notmuch new #2 2.23 0.02 0.03 9384 2144/8
notmuch new #3 0.01 0.01 0.00 9460 0/8
notmuch new #4 0.01 0.01 0.00 9428 0/8
notmuch new #5 0.01 0.00 0.00 9468 0/8
notmuch new #6 0.01 0.01 0.00 9692 0/8
new (52374 mv) 1351.01 537.75 235.45 959524 1027288/8531616
new (52374 mv back) 834.15 489.27 213.97 967040 184/4754016
new (52374 cp) 747.23 284.03 105.51 941992 0/4007120
T01-dump-restore.sh: Testing dump and restore [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
load nmbug tags 32.64 4.16 3.97 12744 776/38968
dump * 5.02 4.81 0.18 26256 8/27928
restore * 5.94 5.43 0.48 9728 0/0
T02-tag.sh: Testing tagging [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
tag * +new_tag 611.53 305.67 229.54 111304 0/1840208
tag * +existing_tag 0.05 0.01 0.00 9340 96/0
tag * -existing_tag 513.58 242.77 215.88 36252 0/1937792
tag * -missing_tag 0.02 0.00 0.01 9332 0/0
T03-reindex.sh: Testing reindexing [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
reindex * 1893.02 590.11 150.22 392180 7572912/4620792
reindex * 853.85 440.58 115.60 337320 3072648/2512376
reindex * 629.36 415.50 107.50 337188 1501448/2507848
T04-thread-subquery.sh: Testing thread subqueries [0.4 large]
Wall(s) Usr(s) Sys(s) Res(K) In/Out(512B)
search thread:{} ... 28.38 8.25 1.49 94304 278064/144
search thread:{} ... 11.25 5.26 0.63 94300 81520/144
search thread:{} ... 3.24 2.94 0.29 94284 0/144
> The other thing I'm curious about is the actual size of the
> database. This varies a lot, but in the past pathological performance
> has sometimes been linked to indexing things that should not be,
> bloating the database.
Here are some relevant lines from my import process, showing how long
Notmuch thinks it's taking, how long it's actually taking, and the size
of the database after each import. The sizes seem reasonable to me.
For comparison, the maildir itself is 4.5 GiB after all of the imports.
Processed 1057 total files in 1s (759 files/sec.).
notmuch new 1.48s user 0.08s system 52% cpu 2.982 total
6.0M .notmuch
Processed 1669 total files in 3s (438 files/sec.).
notmuch new 3.95s user 0.19s system 63% cpu 6.516 total
27M .notmuch
Processed 3338 total files in 9s (359 files/sec.).
notmuch new 9.73s user 0.44s system 71% cpu 14.288 total
71M .notmuch
Processed 7547 total files in 24s (304 files/sec.).
notmuch new 23.82s user 0.97s system 83% cpu 29.521 total
167M .notmuch
Processed 8224 total files in 39s (210 files/sec.).
notmuch new 35.72s user 1.96s system 52% cpu 1:12.08 total
334M .notmuch
Processed 9530 total files in 39s (239 files/sec.).
notmuch new 35.10s user 1.88s system 74% cpu 49.630 total
519M .notmuch
Processed 6029 total files in 46s (129 files/sec.).
notmuch new 26.42s user 1.88s system 24% cpu 1:54.35 total
641M .notmuch
Processed 6387 total files in 38s (167 files/sec.).
notmuch new 24.29s user 1.69s system 13% cpu 3:10.35 total
706M .notmuch
Processed 3113 total files in 10s (308 files/sec.).
notmuch new 10.65s user 0.82s system 6% cpu 2:53.25 total
725M .notmuch
Processed 410 total files in 1s (344 files/sec.).
notmuch new 1.21s user 0.32s system 1% cpu 1:57.68 total
726M .notmuch
Processed 317 total files in almost no time.
notmuch new 1.09s user 0.34s system 1% cpu 2:22.09 total
727M .notmuch
More information about the notmuch
mailing list