Boost logo

Boost :

From: Giovanni P. Deretta (gpderetta_at_[hidden])
Date: 2006-01-01 14:55:51


Christopher Kohlhoff wrote:
> I believe there might be more similarity there already than you
> think, so I'm going to do a bit of experimentation and see what
> comes out of it. However the dual support for IPv4 and IPv6
> raised by Jeff might be a bit of a problem -- is it something
> you address in your network lib?
>

(looks at old code......)
Yes, more or less, that is, the internal implementation has support for
ipv6, but is not exported in the public interface, but it is a matter of
instantiating a stream_template<inet::ipv6>. Unfortunately i didn't test
it (i have no ipv6 experience), but reading from the Stevens book and
from the SUSv3, it seems that ipv6 sockets are backward compatible with
ipv4 (i.e. ipv6 address resolvers can take ipv4 addresses, and ipv6
sockets can connect and accept ipv4 streams).
I think that the cleaner interface should be to have an ip::stream that
is instantiated as a ipv6::stream if there is system support, or as an
ipv4::stream if there is none. ipv4::stream and ipv6::stream should be
available if the user explicitly needs them (i.e. no compatibility), but
  the default should be to use the ip::stream.
>
> I see what you mean now, however I don't think they can be
> portably decoupled. Some platforms will require access to shared
> state in order to perform the operation. The acceptor socket
> caching you mentioned is also a case for having access to this.
>

Sometimes you need to make some hard decision. There are any real-life
protocol/platform that need shared state? If yes, than member functions
is fine. If not, or only some theoretical obscure protocol needs it, it
  should not impact the generic interface. For those obscure/rarely used
protocol/platforms a hidden singleton could be used.

Any way, even if you keep the shared state solution, i think that the
proactor should be decoupled from the protocol implementation. Using
your current model, the protocol_service and the proactor_service should
be two different objects. In fact multiple protocols must be able to
reuse the same proactor.

> I suspect that, if I adopt a type-per-protocol model, the
> service will also be associated with the protocol in some way
> (i.e. the protocol might be a template parameter of the service
> class).
>
seems good... probably is the same thing i proposed in the previous
paragraph.
> <snip>
>
>>Of course non-blocking operations are useful, but as there is
>>no public readiness notification interface (i.e. a reactor),
>>its use is somewhat complicated.
>
>
> What I mean is that the readiness notification isn't required,
> since one way of interpreting an asynchronous operation is
> "perform this operation when the socket is ready". That is, it
> corresponds to the non-blocking operation that you would have
> made when notified that a socket was ready. A non-blocking
> operation can then be used for an immediate follow-up operation,
> if desired.
>
> <snip>
>
>>What about parametrizing the buffered_stream with a container
>>type, and providing an accessor to this container? The
>>container buffer can then be swap()ed, splice()ed, reset()ed,
>>fed to algorithms, and much more without any copying, while
>>still preserving the stream interface. Instead of a buffered
>>stream you can think of it as a stream adaptor to for
>>containers. I happily used one in my library, and it really
>>simplifies code, along with a deque that provides segmented
>>iterators to the contiguous buffers.
>
>
> I think a separate stream adapter for containers sounds like a
> good plan.

I have to add that my buffered_adapter is actually more than a stream
adapter for container, because it has an associated stream, and can
bypass the buffer is the read or write request is big enough. Also the
adapter has underflow() (you call it fill in your buffered_stream) and
flush().

>BTW, is there a safe way to read data directly into a
> deque? Or do you mean that the deque contains multiple buffer
> objects?
>

No, there is no portable way to read data in a std::deque<char>. But I
did not use the std::deque, I've made my own
with segmented iterators and not default construction of pods. Actually
it only work with pods right now, but it is fairly complete, I've have
even added versions of some standard algos with support for segmented
iterators. It should be not to hard to add non-pod support (not that a
net lib really needs it...).

>
>>i'm already thinking of possible extensions... shared_buffers,
>>gift_buffers (i need a better name for the last one) and more.
>
>
> I take it that by "shared_buffers" you mean reference-counted
> buffers?
>

Yes, exactly.

> If so, one change I'm considering is to add a guarantee that a
> copy of the Mutable_Buffers or Const_Buffers object will be made
> and kept until an asynchronous operation completes. At the
> moment a copy is only kept until it is used (which for Win32 is
> when the overlapped I/O operation is started, not when it ends).
>

Hum, nice, but if you want to forward a buffer another thread, for
example, you want to forward the complete type (to preserve the counter
and the acquire()/release() infrastructure). I think that it should be
possible to implement some streams that in addition to generic buffer
object can accept specialized per stream buffer types and guarantee
special treatment for them. For example if write() is called for
inprocess shared memory stream, will copy the buffer if a generic buffer
is passed, but will give special treatment if a special shared buffer is
passed.

I've in mind only shared memory streams for now, but think about a
network transport implemented completely in userspace (with direct
access to the net card): it is theoretically possible to DMA directly
from user buffers to the card buffer, but it might only be possible from
some specially aligned memory. Case in point: the linux aio
implementation requires that a file is opened in O_DIRECT mode. In turn
O_DIRECT requires that the supplied buffer is aligned to 512 byte
boundaries (or filesystem block size for 2.4). This means that a an
asio based asynch disk io subsystem for asio would require its buffers
be specially allocated (or fall back to do an extra copy). This
requirement can easily be met if an hipotethical asio::fs_stream has a
direct_buffer_allocator typedef. The allocator would return objects of
type direct_buffer, and fs_stream.async_{read|write}_some would be
overloaded to explicitly support these buffers. If a direct_buffer is
used, fs_stream will use the native linux aio. If a generic_buffer is
not used, fs_stream should *not* use the linux aio, not even with an
internal properly allocated bounce buffer, because direct bypasses
system caches, so it should be done only if the user explicitly request
it by using direct_buffers. The fall back should probably use worker
threads.

Btw, future versions of linux aio will almost certainly support
non-direct asynch io. Still the O_DIRECT mode will probably be fast-pathed.

In the end this boils down to passing the exact buffer type to the lower
levels of asio implementation.

> However, this may make using a std::vector<> or std::list<> of
> buffers too inefficient, since a copy must be made of the entire
> vector or list object. I will have to do some measurements
> before making a decision, but it may be that supporting
> reference-counted buffers is a compelling enough reason.
>

Usually the vector is small, and probably boost::array is a best fit. In
  the last case, the buffer is cached (as it is in the stack), is very
small (less that 20 elements) and it takes very little time to copy it.
In case of vectors, if move semantics are available (because they are
emulated by the standard library, like the next libstdc++ does, or
because of future language development), no copy is needed.

>
>>Btw, did you consider my proposal for multishot calls?
>
>
> I have now :)
>
> I think that what amounts to the same thing can be implemented
> as an adapter on top of the existing classes. It would work in
> conjunction with the new custom memory allocation interface to
> reuse the same memory. In a way it would be like a simplified
> interface to custom memory allocation, specifically for
> recurring operations. I'll add it to my list of things to
> investigate.
>

I don't think it is worth to do it at higher levels. Multishot calls are
inconvenient because you lose the guarantee one call -> one callback. I
proposed to add them because they can open many optimization
opportunities at lower levels (reduced system calls to allocate the
callback, may be better cache locality of callback data and less
syscalls to register readiness notification interest).

Ah, btw, happy new year :)

---
Giovanni P. Deretta
---

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk