Boost logo

Boost :

From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2007-05-31 05:09:12


Jerry Schwarz wrote:
> For what its worth, I always envisioned alternative top level IO
> classes using streambuf's as the underlying transport. The central
> class of the iostream library is really streambuf and I've long
> regretted not calling the library "streambuf". I always encourage
> people to use streambuf's directly when all they want to do is
> transport bytes.
>
> So I like the idea of this library. It is more consistent with my
> ideas of how "iostream" should be used than is using istream or
> ostream directly to read and write binary files. But I would prefer
> it if binary_istream, binary_ostream and binary_iostream had a
> constructor that took a streambuf argument and that (at least
> optionall) ownership would be transfered to the streambuf. That is
> the stream would be responsible for freeing it.
>
The C++ streambuf class is still character-oriented, not byte-oriented.
In particular, it needs a char_traits parameter (and the associated
state_type typedef and length operation, both of which are meaningless
for bytes), the importance of which I question anyway. What's worse,
it's the stream buffer that's responsible for converting between the
"internal character representation" and the "external character
representation". This means that, unless it's in binary mode (and
perhaps even then, partially), the stream buffer does character set
conversion (if it's a wide stream, or perhaps some exotic system that
enforces a specific encoding for text files) and newline conversion.
Another example of the character-orientation of the buffers is that
there is a stringstream, not a vectorstream.

Having binary streams that can work with classical stream buffers is
good for retaining existing work, but I consider it unacceptable as the
mandatory underlying transport.

I think a new I/O system should be built in layers like this:

lowest) The byte transport. It always seemed weird and a severe
shortcoming to me that C++ basically assumed the base for external data
to be text. The basic underlying unit of external data should be the
byte (i.e. unsigned char in the C++ type system). To represent this
level, I think a source/sink/filter stack system like Boost.IOStreams
would be appropriate, but oriented strictly towards binary. Buffers are
on this level.

--) On top of the byte transport, there is the binary I/O layer, akin to
the proposed binary_iostream here. It handles the fun part of binary
I/O, such as assembling bytes into multibyte types (endianness) or
rearranging the internal data within bytes (also endianness, on systems
where char is large), and perhaps even converting to and from a
canonical external float representation.

--) Aside: Binary serialization of objects could build on the binary I/O
layer.

--) On top of the binary I/O comes the text conversion layer. This
should be a comprehensive character set and encoding library, capable of
converting between the internal character set(s) and external
representations. The external encoding should be chosen by compiler
default, locale or explicitly passed name. This builds on the binary I/O
and not directly the byte transport because the base type of a character
representation can be a multi-byte entity, e.g. in the UTF-16 encoding
with 8-bit bytes.

--) At this point, another filter layer could be inserted, to have
filters that work on the text level. This layer could be made
responsible for handling line ending conversion.

highest) Finally, once we've got text, we add a formatting layer like
the current iostreams.

This gives each layer a clearly defined function, and the programmer a
great choice between ease of use and flexibility.

While we're at it, the interface should consider the possibility of
non-blocking and/or async I/O.

Sebastian Redl


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk