From: John Hayes (john.martin.hayes_at_[hidden])
Date: 2007-06-22 17:45:16
While working on ordinary web software, there are actually a lot more
variations on data encodings than just text and binary:
A binary format may itself be encoded as bytes (of varying endianess), or in
Base64 for email attachments (RFC 2045) or Base32 for URLs or form post data
When encoding in a plain-text format (after encoding into a narrow character
set), there might still be escaping depending on the container. C, JS, XML
attributes, elements and CDATAs, SQL (by database) all have different
escaping rules. This fails to mention sillier issues like newline
Buffering is also an interesting problem because in some formats, buffering
events (like flush overflow or EOF) have streaming output to indicate an
explicit end of stream, minimum remaining distance or differences in
distance (like how many bytes to the next chunk in a stream).
None of these transformations are hard to write, but they are written over
and over because standard streaming operators (be they Java, C++, Perl or
printf) provide no straightforward way to inject the transformations. The
cost tends to be that serializing an object is written several times over,
or worse, gets tied up in a grander object persistence framework.
>From my limited reasearch, the most complete description of a stream
encoding is hidden in the description of HTTP 1.1 entities - this defines a
3-layer model for streaming:
Buffering events: How to determine how large the stream is (TE,
Content-Length, Trailer headers)
Transformations: Preprocessing required before the stream can be
interpretted (Content-Encoding: gzip, deflate, could include byte encodings)
Type: What class should further interpret the content, and for text
entities, the character set encoding (Content-Type).
This is not a complete model, largely because it ignores the issue of
interpretting the content, but it seems like a good place to start since
it's an intro to the problems of portably streaming data.
On 6/17/07, Jeremy Maitin-Shepard <jbms_at_[hidden]> wrote:
> Sebastian Redl <sebastian.redl_at_[hidden]> writes:
> > A few weeks ago, a discussion that followed the demonstration of the
> > binary_iostream library made me think about the standard C++ I/O and
> > what I would expect from an I/O model.
> > The document can be found here:
> > http://windmuehlgasse.getdesigned.at/newio/
> - Binary transport layer issue:
> Make the "binary transport layer" the "byte transport layer" to make
> it clear that it is for bytes.
> Platforms with unusual features, like 9-bit bytes or inability to
> handle types less than 32-bits in size can possibly still implement
> the interface for a text/character transport layer, possibly on top of
> some other lower-level transport that need not be part of the boost
> library. Clearly, the text encoding and decoding would have to be
> done differently anyway.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk