[PATCH] Output unmodified Content-Type header value for JSON format.

Sun Jan 15 09:58:40 PST 2012

On Sun, 15 Jan 2012 11:52:40 +0000, David Edmondson <dme at dme.org> wrote:
> > Technically the IRC discussion was about not including *any* part
> > content in the JSON output, and always using show --format=raw or
> > similar to retrieve desired parts.  Currently, notmuch includes part
> > content in the JSON only for text/*, *except* when it's text/html.  I
> > assume non-text parts are omitted because binary data is hard to
> > represent in JSON and text/html is omitted because some people don't
> > need it.  However, this leads to some peculiar asymmetry in the Emacs
> > code where sometimes it pulls part content out of the JSON and
> > sometimes it retrieves it using show --format=raw.  This in turn leads
> > to asymmetry in content encoding handling, since notmuch handles
> > content encoding for parts included in the JSON (and there's no good
> > way around that since JSON is Unicode), but not for parts retrieved as
> > raw.
> 
> Including the text output in the JSON results in significantly fewer
> calls to 'notmuch' during the building of a typical `notmuch-show-mode'
> buffer. Someone with one of those older, crankier computers could easily
> test how much effect this has by changing
> `notmuch-show-get-bodypart-content' slightly.

Yes.  I was mostly reiterating the IRC discussion for Pieter.  Since
this discussion, I've stabilized on the pre-fetching notion I described
in id:"20120115003617.GH1801 at mit.edu", though I do think we should make
this clear in the code: that the rule for whether the JSON includes a
"content" key for a leaf part is internal to the CLI and that consumers
should be prepared to use it if it's there and to retrieve the content
separately if it's not.  This is exactly how the Emacs code happens to
work, it just hasn't been codified anywhere.  Looking at it this way
gives us more flexibility than the current code takes advantage of; for
example we could omit content from the JSON if it's over some size
threshold since the cost of sending that to a client that doesn't need
it is high while the cost of having the client retrieve it for itself is
relatively low.

> > The idea discussed on IRC was to remove all part content from the JSON
> > output and to always use show to retrieve it, possibly beefing up
> > show's support for content decoding (and possibly introducing a way to
> > retrieve multiple raw parts at once to avoid re-parsing).  This would
> > get the JSON format out of the business of guessing what consumers
> > need, simplify the Emacs code, and normalize content encoding
> > handling.
> 
> Is there a real problem being solved here? Having a clean structure is
> nice, except when it's not.

The "real" problem is the asymmetry in encoding handling that started
this discussion.  Content included in the JSON is re-encoded by the CLI,
while content retrieved via raw needs to be re-encoded by the client.

OTOH, I don't understand the encoding story for HTML, since the encoding
can come from either a header or from the body of the HTML.  Does this
make it strictly necessary for the client to handle the encoding?