[notmuch] Idea for storing tags

Scott Morrison smorr at indev.ca
Tue Jan 12 21:39:14 PST 2010


On 2010-01-12, at 8:24 PM, martin f krafft wrote:

> also sprach Scott Morrison <smorr at indev.ca> [2010.01.12.1711 +1300]:
>> 1.  synchronization of tag data with emails -- if they are in
>> a subfolder then it presents the issue of maintaining this
>> subfolder when managing emails (moving, deleting, duplicating etc)
>> and any .tag folder unaware clients are likely cause an breakage
>> in tagdata/message association.  One way of doing this is to have
>> a global .tag folder.
> 
> A global .tag folder indexed by e.g. message ID, as you state later,
> would probably allow for this. Or a file-per-tag design. We'd have
> to think carefully about pros and cons for each.
> 
> When thinking about this, I always have to remind myself that we are
> targetting this at a design that has indexed search. If that weren't
> the case, searches would be incredibly expensive.
> 
> Maybe a better approach would be content addressing (see below).


Content hashing -- good Idea (& not something that has hit me before) -- better than Message-Id as I believe there are still some MUA /MTAs that allow messages without message ids.  The only potential issue with this is that it is critical then to preserve the message source against encoding changes though that shouldn't be too hard to avoid.

> 
>> 2. what happens if that message is archived or moved to an
>> exclusively local cache -- eg. Mail.app on OS X can easily move
>> IMAP messages to a folder resident on the computers computers?
> 
> Well, if the target can store tags, then ideally the MUA should know
> how to transfer them along.
> 
> Maybe the right thing to do would be to use extended attributes
> (which are stored in the inode!), even if they may not be
> universally supported yet. If our solution scales, then this might
> lead to a significant increase in xattr adoption.
The problem with anything that is not universally supported is that for a package that is to appeal to a wide userbase, most don't know and don't care about the particulars of this IMAP server vs that IMAP server.  all they know it that for some reason it doesn't work with account X -- which leads to support head aches.

> 
>> 3. what happens with duplicates of emails -- I would assume that
>> the message id would be the key to match the tag data to the
>> message.  In this system a duplicate of a message could not have
>> a different set of tags from the original (not that this would
>> necessarily be desirable.)
> 
> Duplicates need folders, and tags and folders are somewhat at odds
> with each other. I mean, you can represent a folder hierarchy with
> tags (and more), and if you have tags and folders, you are
> potentially introducing a level of confusion/ambiguity that we don't
> want in the first place. Maybe the ideal solution doesn't need
> folders anymore (and IMAP-compatible (Maildir) subfolders have
> always been a hack anyway).
> 
> There are also two types of duplicates: copies and links. The former
> can diverge, the latter can't. I don't really see a reason for
> either. It's not like you need to copy a mail before you edit it,
> and I don't see a real reason for linking, assuming that the primary
> means of browsing will be tag-searches anyway.
> 
> Duplicates always make me think of content addressing, like Git's
> object cache. We could store the content hash of a message in its
> filename, and also use the hash to index into the tag database.
> I think that would be much cleaner than message IDs, and would make
> handling true duplicates (links) much easier, while copies (diverged
> ex-duplicates) would also be taken care of automatically.

I agree that conceptually duplicates should be buried but end users do have "peculiar" organization systems.

> 
> -snip-

>> The performance issue is very real -- because it means that
>> somehow messages have to rewritten to the IMAP server -- IMAP
>> doesn't have a mechanism AFAIK for updates.
> 
> Not even UIDPLUS?
> http://wiki.dovecot.org/FeatUIDPLUS
From my reading, uidplus doesn't allow a delta modification of a message on a server -- just to write a portion of a message back -- you still have to write the whole thing back and that can mean real bandwidth issues for some messages.

> 
>> Additionally, IMAP doesn't have a mechanism for simply replacing
>> one message data with another -- a new message must be written and
>> the old message must be deleted and the message IMAP UID will
>> change, and the client will have to deal with this especially if
>> it is cache the messages.
> 
> Yes, I am experiencing this pain regularly, since I currently use
> a lot of message rewriting as part of my workflow — one of the
> reasons why I'd like to find an alternative.
> 
>> Also GMAIL IMAP is an issue-
> 
> Yeah, I bet. Is there anyone who doesn't think that that's Google's
> problem, not ours, though?
> 
Call it Googles problem as you like -- but when I have a product that doesn't work with GMAIL IMAP there are a lot of potential users that don't care about server peculiarities and rather just have it work.




More information about the notmuch mailing list