compacting the notmuch database through systemd

Antoine Beaupré anarcat at orangeseeds.org
Wed Dec 4 11:51:12 PST 2019


On 2019-12-04 13:09:03, Daniel Kahn Gillmor wrote:
> Thanks for raising this, Anarcat!
>
> One more advantage that i think you haven't noted yet about regular
> database compaction:
>
> "notmuch compact" tends to get rid of a lot of lingering written data
> that is no longer referenced.  While this isn't robust "secure
> deletion", it's a lot better than not compacting.  see
> https://trac.xapian.org/ticket/742 for more discussion.

Cool.

> Some questions below…
>
> On Sun 2019-12-01 15:52:19 -0500, Antoine Beaupré wrote:
>
>> Thanks to Bremner, I just realized that notmuch-compact(1) is a thing,
>> and that thing allows me to compress my notmuch databases by about 50%.
>
> do you know why you get the large size/speed gain?

Not sure, but if I'd venture a guess: I never ran notmuch-compact(1) as
far as I can remember.

> do you regularly delete files from your message archive?

Yes, all the time. I have had `d` mapped to `+deleted` basically
forever, and have a pre-new hook that actually deletes those messages
from this.

Yes, I am an heretic. ;)

>> So I whipped together two systemd units (attached) that will run that
>> command every month on my notmuch database. Just drop them in
>> `~/.config/systemd/user/` and run:
>>
>>     systemctl --user daemon-reload
>>     systemctl --user enable notmuch-compact.timer
>>     systemctl --user start notmuch-compact.timer
>
> ("systemctl --user enable --now notmuch-compact.timer" will suffice for
> the final two commands on any reasonably modern version of systemd)

Whoa. TIL.

> How long does it take for these the notmuch-compact.service to complete?

I don't remember... it took less than a minute at the first run, I
think.

> What happens if this is happening when, say, you put your machine to
> sleep, or you power it down?

No idea. I think it's an atomic process as notmuch-compact(1) says:

       The compacted database is built in a temporary directory and is
       later moved into the place of the origin database. The original
       uncompacted database is discarded, unless the
       --backup=<directory> option is used.

> While notmuch-compact.service is running, does "notmuch new" or "notmuch
> insert" work?  If not, how do they fail (e.g. blocking indefinitely,
> returning a comprehensible error message)?

No idea. Manpage says:

       Note that the database write lock will be held during the
       compaction process (which may be quite long) to protect data
       integrity.

> Can you read your mail while notmuch-compact.service is running?

I don't see why not, but I haven't tried. Considering I run it once a
week, it would seem like a small tradeoff if that would cause problems
anyways.

>> Maybe those could be shipped with the Debian package somehow? Not sure
>> how that works, but I think that's how gpg-agent gets started now, if
>> you want any inspiration...
>
> gpg-agent is socket-activated, which is different from the
> timer-activation you are proposing here.

I thought about socket activation, but I don't think it would work in
this case.

> We could easily ship these systemd user unit files in the notmuch
> package now that #764678 is resolved.  Do you think that the timer
> should be enabled by default?

Sure, I don't see why not, unless we have concerns about
notmuch-compact(1) being unsafe or counter-productive.

> What should happen if the user hasn't set up notmuch?  Maybe we need a
> ConditionPathExists= or something like that on either the .timer or the
> .service?

Maybe:

    ConditionPathExists=$HOME/.notmuch-config

?

> Do we expect this to run even when the user isn't logged in at all (a
> background compaction?)

Maybe not? No idea.

> it always gets more complex when you think about trying to do it at
> scale :)

Yes.

>> It would be great if notmuch-new ran this on its own, when it
>> thought that this was "important", somehow like git-gc sometimes runs on
>> its own.
>
> I'm not convinced i like this idea without more profiling and an
> understanding of what it might cause.  I have grown to *really* dislike
> the highly variable latency and warnings caused by GnuPG's
> "auto-check-trustdb", for example (especially as the keyring grows
> larger).

Again, tradeoffs: I prefer to have my trustdb actually checked once in a
while (right?) and not pay that latency cost at some random gpg
invocation (which seems to happen all the time). So I disable the
built-in, inline checks and queue them in a timer instead.

>>  [ notmuch-compact.timer: text/plain ]
>>  [Unit]
>>  Description=compact the notmuch database
>
> systemd timer unit descriptions typically include some mention of the
> duration.  See for example:
>
> /lib/systemd/system/systemd-tmpfiles-clean.timer
> "Daily Cleanup of Temporary Directories"
>
> /lib/systemd/system/certbot.timer
> "Run certbot twice daily"
>
> /lib/systemd/system/phpsessionclean.timer
> "Clean PHP session files every 30 mins"
>
> I recommend:
>
>     Description=Compact the notmuch database every month

Cool.

>> [ notmuch-compact.service: text/plain ]
>> [Unit]
>> Description=compact the notmuch database
>
> The convention is to lead with an upper-case letter:
>
>     Description=Compact the notmuch database

Yay!

> OK OK enough with the nit-picking!

Thanks for the review!

a.

-- 
L'adversaire d'une vraie liberté est un désir excessif de sécurité.
                        - Jean de la Fontaine


More information about the notmuch mailing list