muchsync files renames

David Mazieres dm-list-email-notmuch at scs.stanford.edu
Sat Aug 22 22:41:59 PDT 2015


Amadeusz Żołnowski <aidecoe at aidecoe.name> writes:

> Hi,
>
> I am testing muchsync-2 and it looks to me that files names across
> machines are different.  Moreover when syncing again after
> initialization it seems muchsync is working on something.  I have
> canceled this and rerun muchsync.  notmuch reported lots of files
> renames on server.  What and why it happens?

What muchsync specifically synchronizes for messages in the mapping:

    (directory, SHA-1-hash, link-count)

So if a directory contains two copies of a file on one machine, it will
end up with two copies on the other machine.  However, the file names
themselves are not the same, but rather are created in accordance with
the maildir spec.  (Note SHA-1 wouldn't be my first choice of hash
function, but notmuch already uses this for messages with long message
IDs, so I figured I'd just be consistent with existing practice.)

In terms of what muchsync is working on, you can run it with "-vvvv" on
both sides to get an idea, as in "muchsync -vvvv server -vvvv".  Better
yet, you can just run it on one side with "muchsync -vvvv".  You'll get
a lot of output, so maybe run it inside the script command to save the
output.maybe run it inside the script command to save the output.  If
you have enabled maildir.synchronize_flags, it could be that notmuch is
initially renaming all of your files, in which case muchsync needs to
re-hash them to make sure they haven't changed.

How did you cancel muchsync?  If you send it a single SIGINT or SIGTERM,
it attempts to clean up after itself.  However, upon multiple signals or
other signals, it immediately exits.  Muchsync is conservative about
updating the database, to avoid missing tags or files that have been
changed.  It always updates the notmuch database first, then its own
sqlite database with a version number.  That means if you kill muchsync,
some number of files may get picked up as changed again even though
really they were just copied from a peer.

To mitigate this problem, the muchsync client syncs the database every
10 seconds, so that in theory you should only get 10 seconds of extra
work from killing the client.  However, the server does not sync
periodically, on the assumption that it is more likely to read an EOF
than get killed, although currently it doesn't appear to commit any
pending transactions to the sqlite database upon EOF, which may be an
oversight.

So to summarize:

  * File names are not the same across machine, only file contents and
    directory structure.

  * Give muchsync lots of "-v" options to see what it is doing.

  * Try to avoid killing muchsync.  Doing so is safe, but likely to
    generate extra work in the form of phantom renames or tag changes
    that get synchronized even though they don't need to be.

  * Possibly the server should handle EOF more gracefully and commit any
    pending transactions, or the client should periodically send a
    commit command to the server.

If you think something is wrong, I can help you figure it out, but I
need to know what maildir.synchronize_flags is set to on each replica,
what you mean by "canceled", and roughly what was happening when you
canceled (uploading or downloading).

David


More information about the notmuch mailing list