The overloading of show (was Re: [PATCH] Output unmodified Content-Type header value for JSON format.)

Sat Jan 14 16:36:17 PST 2012

(was in reply to id:87ehv2proa.fsf at praet.org, but I wanted to start a
new top-level thread)

Quoth Pieter Praet on Jan 14 at 10:19 am:
> On Thu, 12 Jan 2012 12:28:40 -0500, Austin Clements <amdragon at MIT.EDU> wrote:
> > Quoth Pieter Praet on Jan 12 at  6:07 pm:
> > > On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements <amdragon at MIT.EDU> wrote:
> > > > Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
> > > > > The open question seems to be how we handle the content encoding
> > > > > parameters.  My argument is that those should either be used by notmuch
> > > > > to properly encode the content for the consumer.  If that's not
> > > > > possible, then just those parameters needed by the consumer to decode
> > > > > the content should be output.
> > > > 
> > > > If notmuch is going to include part content in the JSON output (which
> > > > perhaps it shouldn't, as per recent IRC discussions), then it must
> > > > handle content encodings because JSON must be Unicode and therefore
> > > > the content strings in the JSON must be Unicode.
> > > 
> > > Having missed the IRC discussions: what is the rationale for not
> > > including (specific types of?) part content in the JSON output ?
> > > Eg. how about inline attached text/x-patch ?
> > 
> > Technically the IRC discussion was about not including *any* part
> > content in the JSON output, and always using show --format=raw or
> > similar to retrieve desired parts.  Currently, notmuch includes part
> > content in the JSON only for text/*, *except* when it's text/html.  I
> > assume non-text parts are omitted because binary data is hard to
> > represent in JSON and text/html is omitted because some people don't
> > need it.  However, this leads to some peculiar asymmetry in the Emacs
> > code where sometimes it pulls part content out of the JSON and
> > sometimes it retrieves it using show --format=raw.  This in turn leads
> > to asymmetry in content encoding handling, since notmuch handles
> > content encoding for parts included in the JSON (and there's no good
> > way around that since JSON is Unicode), but not for parts retrieved as
> > raw.
> > 
> > The idea discussed on IRC was to remove all part content from the JSON
> > output and to always use show to retrieve it, possibly beefing up
> > show's support for content decoding (and possibly introducing a way to
> > retrieve multiple raw parts at once to avoid re-parsing).  This would
> > get the JSON format out of the business of guessing what consumers
> > need, simplify the Emacs code, and normalize content encoding
> > handling.
> 
> Full ACK.
> 
> One concern though (IIUC): Due to the prevalence of retarded MUA's, not
> outputting 'text/plain' and/or 'text/html' parts is unfortunately all
> too often equivalent to not outputting anything at all, so wouldn't we,
> in essence, be reducing `show --format=json' to an ever-so-slightly
> augmented `search --format=json' ?

I'm not sure I fully understand what you're saying, but there are
several levels of structure here:

1. Threads (query results)
2. Thread structure
3. Message structure (MIME)
4. Part content

Currently, search returns 1; show --format=json returns 2, 3, and
sometimes 4 (but sometimes not); and show --format=raw returns 4.
Notably, 1 does not require opening message files (neither does 2),
which I consider an important distinction between search and show.

Some of the discussion has been about putting 4 squarely in the realm
of show --format=raw.  One counterargument (which has grown on me
since this discussion) is that the part content included in
--format=json can be thought of as pre-fetching content that clients
are likely to need in order to avoid re-parsing the message in most
circumstances.  I believe this is not the *intent* of the current
code, though without a specification of the JSON format it's hard to
tell.

Other discussion (more interesting, in my mind) has been about
separating retrieving thread structure, 2, from retrieving message
structure, 3.  To me, splitting these feels much more natural than
what we do now, which seems to be inflexibly bound to the specific way
the Emacs show mode currently works.  The thread structure is readily
available from the database, so I think separating these would open up
some new UI opportunities, particularly easy and fast thread outlining
and navigation.  I believe it would also simplify the code and address
some irritating asymmetries in the way notmuch show handles the --part
argument.