compacting the notmuch database through systemd
Antoine Beaupré
anarcat at orangeseeds.org
Wed Dec 4 11:51:12 PST 2019
On 2019-12-04 13:09:03, Daniel Kahn Gillmor wrote:
> Thanks for raising this, Anarcat!
>
> One more advantage that i think you haven't noted yet about regular
> database compaction:
>
> "notmuch compact" tends to get rid of a lot of lingering written data
> that is no longer referenced. While this isn't robust "secure
> deletion", it's a lot better than not compacting. see
> https://trac.xapian.org/ticket/742 for more discussion.
Cool.
> Some questions below…
>
> On Sun 2019-12-01 15:52:19 -0500, Antoine Beaupré wrote:
>
>> Thanks to Bremner, I just realized that notmuch-compact(1) is a thing,
>> and that thing allows me to compress my notmuch databases by about 50%.
>
> do you know why you get the large size/speed gain?
Not sure, but if I'd venture a guess: I never ran notmuch-compact(1) as
far as I can remember.
> do you regularly delete files from your message archive?
Yes, all the time. I have had `d` mapped to `+deleted` basically
forever, and have a pre-new hook that actually deletes those messages
from this.
Yes, I am an heretic. ;)
>> So I whipped together two systemd units (attached) that will run that
>> command every month on my notmuch database. Just drop them in
>> `~/.config/systemd/user/` and run:
>>
>> systemctl --user daemon-reload
>> systemctl --user enable notmuch-compact.timer
>> systemctl --user start notmuch-compact.timer
>
> ("systemctl --user enable --now notmuch-compact.timer" will suffice for
> the final two commands on any reasonably modern version of systemd)
Whoa. TIL.
> How long does it take for these the notmuch-compact.service to complete?
I don't remember... it took less than a minute at the first run, I
think.
> What happens if this is happening when, say, you put your machine to
> sleep, or you power it down?
No idea. I think it's an atomic process as notmuch-compact(1) says:
The compacted database is built in a temporary directory and is
later moved into the place of the origin database. The original
uncompacted database is discarded, unless the
--backup=<directory> option is used.
> While notmuch-compact.service is running, does "notmuch new" or "notmuch
> insert" work? If not, how do they fail (e.g. blocking indefinitely,
> returning a comprehensible error message)?
No idea. Manpage says:
Note that the database write lock will be held during the
compaction process (which may be quite long) to protect data
integrity.
> Can you read your mail while notmuch-compact.service is running?
I don't see why not, but I haven't tried. Considering I run it once a
week, it would seem like a small tradeoff if that would cause problems
anyways.
>> Maybe those could be shipped with the Debian package somehow? Not sure
>> how that works, but I think that's how gpg-agent gets started now, if
>> you want any inspiration...
>
> gpg-agent is socket-activated, which is different from the
> timer-activation you are proposing here.
I thought about socket activation, but I don't think it would work in
this case.
> We could easily ship these systemd user unit files in the notmuch
> package now that #764678 is resolved. Do you think that the timer
> should be enabled by default?
Sure, I don't see why not, unless we have concerns about
notmuch-compact(1) being unsafe or counter-productive.
> What should happen if the user hasn't set up notmuch? Maybe we need a
> ConditionPathExists= or something like that on either the .timer or the
> .service?
Maybe:
ConditionPathExists=$HOME/.notmuch-config
?
> Do we expect this to run even when the user isn't logged in at all (a
> background compaction?)
Maybe not? No idea.
> it always gets more complex when you think about trying to do it at
> scale :)
Yes.
>> It would be great if notmuch-new ran this on its own, when it
>> thought that this was "important", somehow like git-gc sometimes runs on
>> its own.
>
> I'm not convinced i like this idea without more profiling and an
> understanding of what it might cause. I have grown to *really* dislike
> the highly variable latency and warnings caused by GnuPG's
> "auto-check-trustdb", for example (especially as the keyring grows
> larger).
Again, tradeoffs: I prefer to have my trustdb actually checked once in a
while (right?) and not pay that latency cost at some random gpg
invocation (which seems to happen all the time). So I disable the
built-in, inline checks and queue them in a timer instead.
>> [ notmuch-compact.timer: text/plain ]
>> [Unit]
>> Description=compact the notmuch database
>
> systemd timer unit descriptions typically include some mention of the
> duration. See for example:
>
> /lib/systemd/system/systemd-tmpfiles-clean.timer
> "Daily Cleanup of Temporary Directories"
>
> /lib/systemd/system/certbot.timer
> "Run certbot twice daily"
>
> /lib/systemd/system/phpsessionclean.timer
> "Clean PHP session files every 30 mins"
>
> I recommend:
>
> Description=Compact the notmuch database every month
Cool.
>> [ notmuch-compact.service: text/plain ]
>> [Unit]
>> Description=compact the notmuch database
>
> The convention is to lead with an upper-case letter:
>
> Description=Compact the notmuch database
Yay!
> OK OK enough with the nit-picking!
Thanks for the review!
a.
--
L'adversaire d'une vraie liberté est un désir excessif de sécurité.
- Jean de la Fontaine
More information about the notmuch
mailing list