[notmuch] interesting project!

Mon Nov 23 18:57:32 PST 2009

On Mon, 23 Nov 2009 09:08:34 +0200, Dirk-Jan C. Binnema <djcb.bulk at gmail.com> wrote:
> Well, the counter point to the OOM-problems is that is that in many programs,
> the 'malloc returns NULL'-case is often not very well tested (because it's
> rather hard to test), and that at least on Linux, it's unlikely that malloc
> ever does return NULL. Lennart Poettering wrote this up in some more
> detail[1]. Of course, the requirements for notmuch may be a bit different and
> I definitely don't want to suggest any radical change here after only finding
> out about notmuch a few days ago :)

No problem. I'm glad to discuss things. That's how I learn and find out
whether my decisions are sound or not. :-)

I agree that trying to support OOM doesn't make sense without
testing. But that's why I want to test notmuch with memory-fault
injection. We've been doing this with the cairo library with good
success for a while.

As for "unlikely that malloc ever returns NULL", that's simply a
system-configuration away (just turn off overcommit). And I can imagine
notmuch being used in lots of places, (netbooks, web servers, etc.), so
I do want to make it as robust as possible.

> (BTW, there is a hashtable implementation in libc, (hcreate(3) etc.). Is that
> one not sufficiently 'talloc-friendly'? It's not very user-friendly, but
> that's another matter)

Thanks for mentioning the hash table. The hash table is one of the few
things that I *am* using from glib right now in notmuch. It's got a
couple of bizarre things about it:

	1. The simpler-appearing g_hash_table_new function is useless
	   for common cases like hashing strings. It will just leak
	   memory. So g_hash_table_new_full is the only one worth using.

	2. There are two lookup functions, g_hash_table_lookup, and
	   g_hash_table_lookup_extended.

	   And a program like notmuch really does use the hash table in
	   two ways. In the simpler case, we're using the hash to simply
	   implement a set, (such as avoiding duplicates in a set of
	   tags). In the more complex case, we're associating actual
	   objects with the keys, (such as when linking messages
	   together into a tree for the thread).

	   So, it might make sense if a hash-table interface supported
	   these two modes well. What's bizarre about GHashTable though,
	   is that in the "just a set" case, we only use NULL as the
	   value when inserting. And distinguish "previously inserted
	   with NULL" from "never inserted" is the one thing that
	   g_hash_table_lookup can't do. So I've only found that I could
	   ever use g_hash_table_lookup_extended, (and pass a pair of
	   NULLs for the return arguments I don't need).

Fortunately, Eric Anholt spent *his* flight home coding up an nice
implementation of an open-addressed hash designed specifically to be a
tiny little implementation suitable for copying directly into
project. He's testing it with Mesa now, and I might pull it into notmuch
later.

> I could imagine the string functions could replace the ones in talloc. There
> are many more string functions, e.g., for handling file names / paths, which
> are quite useful. Then there are wrappers for gcc'isms (G_UNLIKELY etc.) that
> would make the ones in notmuch unneeded, and a lot of compatibility things
> like G_DIR_SEPARATOR. And the datastructures (GSlice/GList/GHashtable) are
> nice. The UTF8 functionality might come in handy.

Yes. The portability stuff I think is actually interesting. I've thought
it really might make sense to have something that gave you *just* that,
(without a main loop, an object system, several memory allocators or
pieces for making your own memory allocators, etc). I haven't had a
chance to look into gnulib yet, but I'd like to.

As for a list, I almost always find it cleaner to be able to just have
my own list data structures, (to avoid casts, etc.).

And for a hash table, I'm interested in what Eric's doing.

I'm really not prejudiced against using code that's already been
written, (in spite of what might appear I don't feel the need to
re-solve every problem that's already been solved). But I have long
thought that we could have better support for a "C programmers toolkit"
of commonly needed things than we have before.

I definitely like the idea of having tiny, focused libraries that do one
thing and do it well, (and maybe even some things so tiny that they are
actually designed to be copied into the application---like with gnulib
and with Eric's new hash table).

> Anyway, I was just curious, people have survived without GLib before, and if
> you dislike the OOM-strategy, it's a bit of a no-no of course.

Thanks for understanding. :-)

And I enjoy the conversation,

-Carl