Boost logo

Boost :

From: Giovanni P. Deretta (lordshoo_at_[hidden])
Date: 2005-04-25 00:55:15


Don G wrote:
>
>>- User is not required to reference streams by
>>pointer, streams are stack allocated or are simply
>>members of another object. Internally they have a
>>smart pointer to an implementaion handle. Consider
>>them stack-based proxies. The acceptor and the
>>connector return the handle that is asigned to the
>>stream.
>
>
> So is this choice just for user simplification? Internally, the user
> is still holding a pointer, right? What are the copy semantics of the
> objects held by the user? This is where things can be tricky any way
> you go. Either the object is copyable and confusion can come via
> aliasing, or they aren't which is probably better in this case, but
> could possibly cause some idioms to not work (like "stream s =
> my_clever_stream_creator()"). My preference was to use shared_ptr<>
> as the semantics are well understood and objects can layer easily in
> obvious ways, but the cost is "->" vs "." syntax.
>

Yes it holds a pointer, a shared_ptr actually, this makes it possible
for some part of the library to temporarily hold a (potentially weak)
reference to the handle without fear that it might be destroyed/closed.
I think this will come handy with asynchronous I/O. [1]

I think that stack semantics are much more intuitive for non polymorphic
objects (i.e. iostreams versus streambuffers).
Your example could be rewritten as 'my_clever_stream_creator(s)' without
really losing expressivity in the non-polymorphic case. This is exactly
how connectors and acceptors work in my library.
Currently the wrapper is copyable, but i will probably correct this
unless i find very good reasons not to (the only one i can find
currently is two threads wanting to do parallel i/o on the same file:
the stream classes are not thread safe, so each thread might want to
have a copy).

1: Note that the internal file descriptor is closed when and only when
the owner handle is closed. There is no close() call although shutdown()
is available. Thus there is no risk that the operating system might
reuse the same file descriptor number while there are stale FDs around.
Usefull if you need a 'FD->handle' map.

>
>>- The preferred way to do input output is to use
>>standard-like algorithms (i.e. copy) with buffered
>>stream adaptors and specialized input/output
>>iterators. I believe that an efficient library can
>>be written this way and be very C++-user-friendly.
>>Classic read/write are still available, but their
>>semantics might be surprising.
>
>
> I agree that this is the right approach for many users and protocols,
> but most network programmers (including myself<g>) need access to the
> primitive behaviors. They won't find them surprising unless the
> wrapping violates expectations coming from sockets-like programming.
>

They are available, in fact they are necessary to implement the rest of
the library ;-), but they do not try to be user-friendly: they have many
parameters, complex return values, non trivial preconditions and
postconditions. For example there is no guarantee that a write always
writes the whole buffer in absence of errors, it might do a partial
write for no reason at all (obviously minimizing the number of calls is
a quality of implementation issue).

>
>>- All classes are concrete, no polymorphism is used
>>(i.e. no virtuals). Polimorphic behaviour must
>>currently be achieved with some external mean (i.e
>>using the external polymorphism pattern. I think
>>that the boost::IDL library would be great).
>
>
> Here is where we are at different ends of the spectrum :). I didn't
> see any SSL code, so I can only imagine how the http code would
> handle SSL vs. non-SSL stream underneath. Ideally, this should not
> require two template instantiations like http<stream> and
> http<ssl_stream>, for example.
>
> In the end, I thought templates had little to offer at this level.
> Parameterizing protocols by stream type seems (IMHO) to buy nothing
> in particular except the removal of virtual at the expense of the
> user having to specify <kind_of_stream> and _lots_ of extra code
> generation. The app should be able to layer objects as it sees fit
> and run-time polymorphism is (again, IMHO) the right solution to that
> problem.
>

I did start with virtual interfaces based design (Part of it still
visible, for example the address object is way too much complex for my
current needs, also the domain object is a relic of a factory based
structure).
It took me a long time to find on a general stream interface that was at
least partially statisfying.

When i started implementing the concrete objects i've found that many
methods looked almost the same, so i refactored the code and put the
common code in an implementation class. Then I thought that the library
user could find usefull to deal with the actual stream type, and
promoted the implementation class as a public object, with a virtual
interface adapter optionally applicable (i.e. the external polymorphism
pattern, or type erasure).

Even the virtual adapter could be generated with the use of templates.

I was happy with this design untill I realized that I was just
duplicating what could be better done with a dynamic_any or with
boost::IDL, and i scraped it. Only the concrete objects were left and i
  have yet to find the need to put the interfaces back.

Dynamic polymorphism can certanly increase flexibility without template
bloat, but there would be really that much code generated? An
http<tcp_stream> and an http<ssl_stream> certanly can share 99% of the
code, what you really need is a parametrized function that fetches the
data from the stream and put it in a buffer. Keep most of the code in a
base class, or better, put the parametrized function (or functor) in a
boost::function and store it to the non-templated http protocol object.
Easy.

I didn't remove virtuals just for the sake of it, but it is just an
accident of design. I might consider putting them back in the internal
handle, at least to give read/writes polymoprhic behaviour (this would
mimic the iostream and streambuffer pair).

BTW, i do not exactly understand what do you exactly mean with 'beeing
free to layer objects'.

>>- Errors can be reported both with exceptions and
>>with error codes. Exceptions are used by default
>>unless error callbacks are passed. This seems to
>>work quite well. Internally only error codes are
>>used and exceptions are thrown only at the most
>>external abstracion layer.
>
>
> This is a good idea, and very similar to what I have done as well. At
> least for async. What is the behavior of blocking read in the face of
> error? Is the user callback made inside read? If so, what does read()
> return?

An error is thrown, unless a callback is provided. If so the error code
is passed to the callback. A throwing read returns the amount of data
read, a 'callback augmented' read returns the callback itself (callbacks
are passed by value as with standard algorithms). The amount of data
read is passed to the callback along with the error code.

>
>>I will probably add status bits a-la iostreams.
>
>
> What kind of bits? I can see eof and fail and those cannot be
> cleared. Others?
>

Currently the only bits that i plan to have are: 'input buffer grown'
and 'output buffer flushed' usefull for asynchronous i/o and buffered
streams. I actually do not have (yet) eof and fail because initially
the stream was supposed to be thread safe and it had to be stateless. I
I will certanly add state to keep track of closed connections and
obviously it will only be resetted if the internal handle is
reinitialized.

>>- File streams. The library actually try to be
>>a generalized i/o framework, and file streams are
>>provided for completeness.
>
>
> At an abstract level, they are very similar and should behave in a
> similar way. I haven't tried to tackle that part because it is an
> area where there is already something in place, albeit not async, and
> I didn't want to try to integrate into iostream (not my cup of tea).
>

I think that file I/O is as important in network programming as network
I/O itself, so it is usefull to have an unified framework.

BTW, polling for I/O readiness (i.e. the select model) does not make
sense with files, i believe that the asynchronous I/O model is the only
non blocking io model that fits all stream types.

>>- The library can be extended simply by creating
>>new handles. In addition to TCP streams there are
>>Unix streams (come almost for free :-) and file
>>streams. SSL/TLS was present but did get broken
>>some time ago and didn't have the time to fix it.
>
>
> I would be most curious to know how SSL fit in your library and how
> other layers interact with or are shielded from it.
>

Nothing very complex, really. I did write a thin wrapper over OpenSSL.
I only did take advantage of the ability to initalize a context with an
already connected file descriptor, then wrapped the context along in an
handle. The read/write methods simply forwarded the call to
SSL_read/SSL_write. I didn't really take advantage of the BIO
infrasturcture, that will probably be necessary to make an ssl_stream an
adapter over any kind of stream.

>
>>- Input/Output buffer.
>
> [some good stuff was here<g>]
>
>>From the little I've read through the code, it looks like this is a
> layer above the raw stream impl. I think that is exactly the right
> way to go. :)
>

Yes, the buffered stream is just a layer above the standard stream.
Also it should be very easy to implement a streambuffer on top of the
buffered stream adaptor.
I think that the buffered adaptor will greatly simplify the asynchronous
buffer management: asynchronous reads put data in the internal input
buffer that can be grown efficiently as much as needed (it is a deque);
user code copies data from this buffer to its own buffer, or takes
ownership of it. Asynchronous writes take data from the output buffer;
user code copies data form therir internal buffers to this buffer, or
relinquish ownership of their buffer or, if they want to keep ownership
of the buffer and still avoid the extra copy, the must use a special
buffer that is guaranteed to be immutable (i.e. once created it can
never be changed, copies share the internal data using shared_ptrs. I
do not have it yet, will add it when i'll attack the asynchronous io
problem). If the user does not want to have automatic buffer
management, it can still use the unbuffered functions, but then it is
his job to garantee that buffers stay valid and unchanged untill the
operation is completed (i.e. a mess!!).

>
>>Missing (definitelly not complete list):
>>
>>The library is fully sinchronous for now. I'm
>>still considering how to add asynch support. I
>>think i will implement it in the buffered adaptor
>>I/O is done asynchronously to the internal
>>buffers that can grow as much as it is necessary.
>>Timeouts are definitelly a must-have.
>
>
> Agreed on async and timeout. Can sync calls be manually/explicitly
> canceled? In my experience (and opinion<g>), a reader/writer MT
> design needs cancel semantics. Without it, such an app cannot be
> responsive to outside stimuli.
>

No, not yet, still on my todo list.
Well, you can obviously cancel a pending operation by shutting down the
stream, but a much more gentle solution is needed :-).

>
>>Final notes:
>>
>>I've have seen that the current consens is to
>>encode the the stream type in the address, so
>>to allow a dynamic behaviour: the actual
>>transport is selected only at runtime, based
>>on the address string. I think this is a bad
>>decision (i considered doing it while
>>implementing my library) and this is why:
>
>
> I am more and more convinced that this is not the right approach for
> the library core, but from different reasons (see other posts). It
> could be offered as a stand-alone library for an app that has the
> need for this, but I think it is most likely a trivial map problem
> (plus a little text manipulation).
>

Yes, as an add-on would be fine, but the transport-encoded address
should not be a central concept.

>>- C++ is a static language, let's leave these
>>niceties to more dynamic languages.
>
>
> I think C++ is quite dynamic (not in the java script way<g>) and
> should exercise that power where appropriate :) It pains me to see
> Java servers everywhere. C++ can and should have all the HTTP, SSL
> and server stuff and be as easy to develop servlet-like things. One
> does not need reflection, dynamic loading whatnot to play well in
> that space. One does need standard (or at least defacto) libraries.
> Without them, effort is fragmented and disjoint.
>
> Which is why I joined boost. :)
>

Well with static i meant :"do as much work at compile time as possible"
which translates to "catch as many errors as early as possible" ;-).
C++ is certanly dynamic, but i kind-of-like the way everything is NOT
always an object.

[Really off-topic] BTW, I would *love* complete, standard, compile-time
reflection facilities.

>
> [...]
>
>
>>- It is extremely insecure. In a network library
>>security must be paramount. If the transport type
>>were encoded in the address, it would be much
>>harder to validate externally received addresses.
>
>
> Good point. Validation is one thing, but meeting expectations of the
> software is another. In some cases, just any transport may not be
> appropriate and hence should be validated. This can be done from the
> string form, of course, but it presents a wider interface.
>

Just to give an example: Stevens in Unix Network Programming Vol 1 shows
an example of getaddrinfo that, as an extension, could return unix
domain sockets in addition to ipv4 and ipv6 sockets. Glibc did actually
implement the extension. It was removed later because of security
concerns: see this post for details.

  http://sources.redhat.com/ml/libc-hacker/2001-05/msg00044.html

You might want to treat streams polymophically once created, but at
creation the type should be statically known by the user code, because
it needs to be aware that not all streams have the same semantics. You
might say that not all streams are 'created' equals :-).

>
> [...]
>

>
>>Sorry for the long post, just tryin' to be usefull :-).
>
>
> Don't be sorry. I am sure I've written longer posts and it was
> helpful.
>

Well, this *certanly* was a long post. I hope i've cleraed some details
of my library.
Now, let's get back to code.

--
Giovanni P. Deretta

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk