[notmuch] Quick thoughts on a notmuch daemon

Mike Hommey mh+notmuch at glandium.org
Fri Jan 8 21:51:20 PST 2010


On Fri, Jan 08, 2010 at 11:26:31PM +1300, martin f krafft wrote:
> also sprach Mike Hommey <mh+notmuch at glandium.org> [2010.01.08.2220 +1300]:
> > FYI, I have a good experience writing fuse filesystems, both with
> > high-level and low-level APIs. I'd avise to use the low-level API,
> > which allows for better performance.
> 
> I don't have any experience with FUSE yet, but the examples in
> /usr/share/doc/libfuse-dev/examples/ look trivial. This is where
> I would start, one function at a time. If you have a better
> suggestion, I'd love to hear it; or to clone your repo! ;)

As I said above, there are 2 sets of APIs in FUSE.

The high-level API sends the full path for the file being accessed for
every system call. And except for specific cases such as read(), write()
or readdir() you have nothing else to identify the file you are referring
to, which means you have to parse the path, and find the proper file
accordingly.
In notmuch case, that would mean doing a search for most system calls.
Try to imagine how many syscalls that are not read(), write() or
readdir() mutt does when opening a Maildir.

The low-level API, otoh, uses inode numbers extensively (again, except
for read, write and readdir). The lookup call is responsible for resolving
the paths, given an inode and a name. Its results are cached by the kernel.
So, for example reading foo/bar from your fuse mount point will lookup
foo in the inode 1 (FUSE_ROOT_ID) and then do another lookup for bar in
the first result.
One of the problems with this API is that the inode number type is
unsigned long, which means you can't necessarily map real inode numbers,
which can be 64 bits. And even if it could, afaik, there is no quick way
to get a file from its inode, sadly.

All in all, in the high-level API case, that means we would need lookups
caching badly, and in the low-level API case, some fast way to map on
one hand virtual directories with inodes numbers, and on the other hand,
real files with inode numbers.

Some quick thoughts, about the whole thing:
- We will need to be careful about deduplication: if you copy a file
  from one directory to another, you don't want to have the copy in the
  underlying Maildir. But as you won't know until the file is totally
  written and closed...
- We should probably allow extra files to be stored in the virtual
  Maildir (for example, courierimap stores stuff in a Maildir)
- We may not need a client program at all, the "search directories"
  configuration could be handled via extended file attributes.

I also had another not quite unrelated idea a while ago, that could have
its value here: a generic data store, very much like the git object
database (an idea would be to have the git object datastore be a special
case of this generic data store, for possibly interesting compatibility),
which would allow for better storage of the messages: if the maildir is
exposed via fuse, why would you need a raw maildir for ? It would also
allow easier deduplication of messages that are different but not quite:
- Mailing list replies you get both directly and from the mailing
  list software, their headers have differences, but the files are mostly
  equivalent
- Mail quotes are found in both the original message and its response.

Mike


More information about the notmuch mailing list