Boost logo

Boost :

From: Darryl Green (darryl.green_at_[hidden])
Date: 2005-12-26 01:29:48

Christopher Kohlhoff wrote:
>>documentation mentions the memory usage overhead of the
>>proactor emulation on systems using select/epoll. However it
>>is clear that there is considerable allocation overhead and
>>construction overhead involved (and not only on those
>>systems). While much discussion has centered on allocator
>>performance, I'm concerned that the user/library constructor
>>code invoked for each i/o operation is likely to be
>>non-trivial as well. For bulk transfers, it is likely that
>>these issues won't be too severe, however for systems where
>>low-latency handling of small messages is critical, it is
>>quite possible that these overheads will be comparable to or
>>larger than any other part of the processing for a
> I don't deny that there may be room for performance improvement
> in the implementation. But as I recently tested the custom
> allocation with Win32 I/O completion ports, and found that it
> performs almost identically to synchronous I/O, I don't think
> the public interface imposes any unnecessary inefficiencies.

Caleb's results on Linux aren't too encouraging though.

>>I find the way the design supports, and the documentation
>>encourages, breaking up handling of structured messages into a
>>small header read followed by a larger read (with appropriate
>>handler for expected body/chunk) while the implementation
>>appears to do some quite heavyweight operations compared to
>>just reading a small header to be almost an encouragement of
>>an antipattern for efficient I/O.
> I don't agree that asio favours this approach over others. But I
> do believe that this approach is perfectly valid and extremely

Ok - my concern is that the examples should show best practice and the
lib should make it easy to do things the right way. Using the lib as
currently implemented, in the way shown somewhere (I think maybe this
was just in a post - I can't find it in the examples which was where I
imagined I saw it - sorry for the noise) would give poor performance.
This should be able to be addressed by:

i) Improving lib efficiency
ii) Providing examples of the right way to do it
iv) Providing an interface that is more efficient

I believe you intend to do (i) and below you offer some alternatives for
(ii) so this would seem to be well enough covered.

> useful, since it permits asynchronous code to be written in
> terms of "contracts". I.e. perform this operation (or
> operations) and don't come back to me until it's all done or an
> error has occurred.

I tend to implement that at a layer above the raw I/O, but I take your

> stream_socket my_socket;
> with:
> buffered_read_stream<stream_socket> my_socket;

Ah - that would be a closer approx to how I would expect this to be done
- docs!

>>I would expect an optimistic reactive approach (read the
>>header when notified, determine size, handler etc and
>>opertunistically attempt a read of the body) to be
>>significantly faster.
> Let's take your example of a fixed sized header followed by a
> body, and compare and contrast a reactive approach with the
> alternatives offered by asio.
> ---------------------
> 1. Reactive.
> 2. Asio with multiple reads.
> This is the approach where we issue asynchronous reads for
> exactly the length of the header, and then the body. It may be
> less efficient, but only because of the additional pass through
> the demultiplexer, not because it makes more read() system calls

Yes - precisely the problem. The select-based demuxer needs to be made
more efficient by optimization or a modified interface.

> than the above. However feel free to compare in terms of
> readability :) The main reason it's clearer is that you no
> longer have to deal with short reads -- async_read's contract
> takes care of that for you.

I think you will find reasonable reactive libs/async buffering providers
etc do much the same thing for readability.

> 3. Asio with multiple reads + opportunistic body read.
> This extends number 2 by adding in the opportunistic read of the
> body. In effect this makes it equivalent to your reactive read

Yes. It seemed like an odd mix of sync and async IO so I didn't consider
it as an idiom to recommend in general.

> 4. Asio with bulk reads.
> This approach is exemplified by the HTTP server example.
> Essentially it reads as much as is available (up to some limit
> of course) in one go, and then processes it using some state
> machine similar to that in number 1. It is generally the most
> efficient since it requires the fewest number of read system
> calls or passes through the demuxer.

Yes. This is fine for streaming/bulk transfers. My concern was more
about an idiom for lots of short request/response type (low latency
needed) messages.

> 5. Asio with transfer_at_least(...).
> In certain applications you may be able to combine the
> opportunistic read into the initial asynchronous read operation,

The circumstances where this applies for TCP are hard to imagine, but if
you had char device IO I would have a use for this (mind you, the device
in question always returns the full message or nothing, so it is not
quite a match either).

Thanks for pointing out the alternatives, and maybe at least some of
these should get a mention in docs in the context of some more
substantial tutorial, but better would be to just ensure that the
simplest approach/interface is also efficient and illustrate that.

>>It is implied that an appropriate demuxer/demuxer service can
>>be used to implement various multi-threaded processing
>>schemes. This may well be true. However, an alternative
>>approach is to simply consider the affinity of an Async_Object
>>and its demuxer(s) (one for read, one for write) to be dynamic
>>and run a number of separate (per thread) demuxers, moving
>>Async_Objects between them as required. In this way the
>>handlers for a given Async_Object will only run in the
>>expected context.
> This is not portable. With Win32 I/O completion ports for
> example, once a socket is associated with an I/O completion port
> it cannot be disassociated. The demuxer design reflects this.

Ah - that makes it a bit messy. Do you consider that the locking
dispatcher effectively provides the same result? I suspect it does so
long as all access to what amounts to an active object guarded by the
dispatcher does in fact go through the dispatcher. I assume this is the
intent and other interfaces (other than I/O based ones I mean) should be
wrapped and dispatch or post used to run them? An example would help a
lot here.

>>case. I haven't reviewed the code to a level of detail where I
>>can be sure I am understanding the dispatching done by the
>>epoll and select reactors, but it looks to me as though
>>multiple threads may all wait on the same fdset (or
>>equivalent)? Is this right? Intended? I had expected some form
>>of modified leader/followers pattern here?
> No, only one thread in the pool waits on select/epoll/kqueue.
> Any other threads in the pool wait on a condition.

I believe you - just can't see how this works when multiple threads call
run() on the same demux. I must be missing something in how the service
actually gets invoked.

>>In particular, it would seem
>>that some form of re-use of a single handler and buffer for an
>>Async_Object instance without any
>>allocation/(copy)construction of anything heavier than a
>>pointer should be quite practical.
> Asio's interface already supports "lightweight", reusable
> handlers, since ultimately a handler is just a function object.
> If you wish, you can implement your handlers so that they are no
> more expensive than a pointer:
> class my_handler
> {
> public:
> explicit my_handler(my_class* p) : p_(p) {}
> void operator()(const asio::error& e) { p_->handle_event(e); }
> private:
> my_class* p_;
> };

Yes I can - but the lib should make simple things simple. I don't
anticipate that I would make much use of the dynamic (per request)
handler facility for architectural reasons and I image a lot of use
would be similar. Providing an easy way to associate a persistent
handler with a stream/async object and making this efficient would be
relatively easy wouldn't it? Simple use would then use bind and allocate
a boost::function or similar once, not per request, with correspondingly
better performance.

Darryl Green.

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.7/214 - Release Date: 23/12/2005

Boost list run by bdawes at, gregod at, cpdaniel at, john at