Boost logo

Boost :

From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2007-06-22 19:53:04


"John Hayes" <john.martin.hayes_at_[hidden]> writes:

> While working on ordinary web software, there are actually a lot more
> variations on data encodings than just text and binary:

It seems fairly logical to me to have the following organization:

 - Streams of arbitrary POD types

   For instance, you might have uint8_t streams, uint16_t streams, etc.

 - A byte stream would be a uint8_t stream.

 - A text stream holding utf-16 encoded text would be a uint16_t stream,
   while a text stream holding utf-8 encoded text would be a uint8_t
   stream. A text stream holding iso-8859-1 encoded text would also be
   a uint8_t stream.

There is the issue of whether it is useful to have a special text stream
type that is tagged (either at compile-time or at run-time) with the
encoding in which the data either going in or out of it are supposed to
be. How exactly this tagging should be done, and to what extent it
would be useful, remains to be explored.

It seems that your various examples of filters/encoding, like BASE-64,
URL encoding, CDATA escaping, and C++ string escaping, might well fit
into the framework I described in the previous paragraphs. Many of
these filters can be viewed as encoding a byte stream as text.

Let me know your thoughts, though.

-- 
Jeremy Maitin-Shepard

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk