Boost logo

Boost :

From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2007-06-29 13:03:29


"Scott Woods" <scott.suzuki_at_[hidden]> writes:

>> ----- Original Message -----
>> From: "Sebastian Redl" <sebastian.redl_at_[hidden]>
>> To: <boost_at_[hidden]>
>> Sent: Monday, June 18, 2007 3:51 AM
>> Subject: [boost] [rfc] I/O Library Design
>>
>> [snip]
>>
>>> The document can be found here:
>>> http://windmuehlgasse.getdesigned.at/newio/
>>>
>>> I'd especially like input on the unresolved issues, but all comments are
>>> welcome, even if you tell me that what I'm doing is completely pointless
>>> and misguided. (At least I'd know not to waste my time with refining and
>>> implementing the design. :-))
>>>

> Hi Sebastien,

> Thanks for the read of your doc. On the basis of that and the quality of the
> related postings I think your efforts have already paid off.

> I made a couple of attempts to write a decent analysis of your design but
> they quickly became too detailed and not suitable for this mailing list. I
> suspect
> that my point of view also needs more work.

> Some of your open issues;
> * Basic Unit
> Small but ugly issue. My feelings are that non-8-bit-byte archictures need
> to be explicitly chopped out of the scope or a pure abstract "basic unit"
> (basun? like beson
> only more elusive :-) needs to be defined in a similar manner to a
> codepoint name. Its
> a strategic decision.
> Personally I would go for the 8-bit-only.
> * Async Requests
> All I/O is more cleanly considered to be async. A sync model of access can
> always be implemented over the top.

In theory, this might be true, but in practice there is a significant
difference in interfaces required for synchronous operations and the
interfaces required for asynchronous operations: synchronous operations
can essentially just block, while asynchronous operations need not only
access to something like the io_service from asio, but also each
operation needs to be supplied with a completion callback.

Using a completion callback is inherently less efficient than merely
blocking, and explicitly waiting for completion is also inherently less
efficient, and consequently building synchronous operations on top of
asynchronous operations would add some significant overhead.

There is also a bigger problem. In many cases asynchronous operations
simply aren't supported at all by the underlying device, and would have
to be emulated by threads, which would add even more overhead. I
suppose one possibility would be for the "asynchronous" operation to
just block if the underlying device doesn't support asynchronous
operations.

As far as I understand, you are suggesting that every interface be
asynchronous, except that a single synchronous layer at the very top
could be added, but nothing else would be built on top of the
synchronous layer. The issue is that it is somewhat more complicated,
or at least more verbose in C++, to program using asynchronous
interfaces, and imposing this inconvenience on users even when they
ultimately intend to use the synchronous interface seems undesirable.

Perhaps much or all of the run-time overhead could be avoided by using
templates in certain ways, while still avoiding source code
duplication. I do agree that asynchronous support would be very useful;
it just seems very hard to support in practice. It is definitely
something that should be considered thoroughly, though.

> * Putback
> Very contentious. Currently I am swinging towards "no". I have a rule for
> all of my encodings that each item has "positive termination" or in
> language
> processing terms "simple accepting states"

I actually think this is a facility that should be provided by the I/O
library, but it need not be a requirement of the basic stream interface.
Rather, it can be implemented as a filter that can be applied to any
stream.

> * Representation/Endian
> I think this issue should be bundled with "parsing". Exactly what the ntohl
> functions do is a nice simple model for what should be done here.
> * Inexact bit counts
> Refer to "Basic Unit".
> * Buffer types and encodings
> Buffers typed to encodings? No. The only thing buffered will be blocks of
> basic units.

I think he may have actually meant marking streams with particular
encodings. Specifically, whether there should be a uint16_t stream
marked (either at compile-time or run-time) as containing UTF-16 text (a
"text stream"), or should there just be uint16_t streams with no such
marking.

I agree that it is useful for the same buffering facility to be
applicable to both text and data (non-text) streams.

> * Interface (I/O streams-like needed?)
> Yes, if only because adding the backward compatibility should be easy given
> the design/impl goes well.

> Some general points;

> 1. Confusion around char, byte, text, binary, encoding and codepoint
> For me this has been a bit frustrating (its been untidy for a long time)
> and
> also
> illuminating (Unicode). For me there are bytes (or basic units) and items
> of
> application data. Everything in between is encoding.

I see what you mean by this more so than when responding to your
previous post. The fact that the application itself may view certain
stream or I/O facilities as merely relating to encoding application data
does not preclude the usefulness of certain abstractions within the I/O
library, like byte streams or text streams. Furthermore, I think it is
common that an application would want to deal directly with a text
stream, because raw text is the application data.

> 2. Inclusion of "endianness" and "representation" in the binary layers.
> IIUC you are allowing applications to declare that they will only talk to
> (e.g. ) Motorola-based machines. I suppose this can be justified but from
> an
> engineering point of view the strategy implicit in ntohl is more appealing.
> The subtle drawback of allowing the declaration of endianness and the
> fact that underlying operations (e.g. network nagling) shaft it anyhow
> makes it a "no go" for me.

I don't think this is what was intended by supporting endian conversion.
I believe it is intended to support a very wide variety of operations
relating to endianness conversion and other representation conversion.
In particular, it would certainly support converting integer types
between big endian and native endian, which is what the
ntohl/htonl/ntohs/htnos functions in the BSD socket interface do. This
facility would be useful, for instance, for decoding UTF-16-BE. It
would also be supported, however, to convert between little endian and
native endian, or between little endian and big endian.

> 3. Lack of extensibility
> While your design doesnt actually preclude this, it also isnt explicit
> about
> it being possible, i.e. how would you redo your diagram for an application
> that is using different encodings over different network connections and to
> data files.

I agree that it is important to define the core concepts/interfaces
(like "device" or "data stream" or "text stream") that will be points
where the I/O library interfaces with external facilities.

-- 
Jeremy Maitin-Shepard

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk