[PATCH] new: Don't scan unchanged directories with no sub-directories
Austin Clements
amdragon at MIT.EDU
Thu Oct 24 14:08:37 PDT 2013
There might be a problem with this patch. Directory entries that are
*symlinks* to other directories do not increase the containing
directory's link count, but we do count them as directories in
add_files pass 1 and traverse in to them. Hence, if you had a
directory that contained no sub-directories, but did contain symlinks
to other directories, we would fail to notice changes in the symlinked
directories.
We could check if the database thinks there are sub-directories and
only bail early if the directory is unchanged and *both* the file
system and the database think there are no sub-directories.
Quoth myself on Oct 24 at 4:33 pm:
> This can substantially reduce the cost of notmuch new in some
> situations, such as when the file system cache is cold or when the
> Maildir is on NFS.
> ---
> notmuch-new.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/notmuch-new.c b/notmuch-new.c
> index faa33f1..364c73a 100644
> --- a/notmuch-new.c
> +++ b/notmuch-new.c
> @@ -323,6 +323,26 @@ add_files (notmuch_database_t *notmuch,
> }
> db_mtime = directory ? notmuch_directory_get_mtime (directory) : 0;
>
> + /* If the directory is unchanged from our last scan and has no
> + * sub-directories, then return without scanning it at all. In
> + * some situations, skipping the scan can substantially reduce the
> + * cost of notmuch new, especially since the huge numbers of files
> + * in Maildirs make scans expensive, but all files live in leaf
> + * directories.
> + *
> + * To check for sub-directories, we borrow a trick from find,
> + * kpathsea, and many other UNIX tools: since a directory's link
> + * count is the number of sub-directories (specifically, their
> + * '..' entries) plus 2 (the link from the parent and the link for
> + * '.'). This check is safe even on weird file systems, since
> + * file systems that can't compute this will return 0 or 1. This
> + * is safe even on *really* weird file systems like HFS+ that
> + * mistakenly return the total number of directory entries, since
> + * that only inflates the count beyond 2.
> + */
> + if (directory && fs_mtime == db_mtime && st.st_nlink == 2)
> + goto DONE;
> +
> /* If the database knows about this directory, then we sort based
> * on strcmp to match the database sorting. Otherwise, we can do
> * inode-based sorting for faster filesystem operation. */
More information about the notmuch
mailing list