Boost logo

Boost :

From: Christopher Kohlhoff (chris_at_[hidden])
Date: 2005-12-22 23:29:53


Hi Darryl,

--- Darryl Green <darryl.green_at_[hidden]> wrote: <snip>
> I think the design as described by the concepts section is
> very good, but it would be better if the implementation
> offered finer-grained selection of the desired concepts.
> Perhaps this is just an aesthetic concern, and it would be
> easy to modify the existing services, but if these really are
> separate concepts, why are they bundled together? I'm
> concerned that there may be some lack of flexibility in the
> implementation/interface actually offered that isn't apparent
> from the design. The issue of separating out the sync vs async
> interface has already been raised;

This has been extensively discussed elsewhere, but in summary I
can see no compelling reason why there should be a separate
low-level sync-only interface, and many advantages to having
them combined. However I still intend to rename the demuxer to
make its purpose clearer for all sorts of applications.

> I would also like to see the ability to expose a reactive
> rather than proactive interface by using a
> reactive_stream_socket_service in place of an
> async_stream_socket_service. The concepts would seem to allow
> for this.

This is not consistent with the goal of portability. As I
mentioned in the "reactor versus proactor" thread, the proactor
is a more portable pattern than the reactor. In the longer term
I'm not averse to allowing a reactor abstraction to become part
of some secondary public interface, where the restricted
portability is explicitly noted.

> One item that I have difficulty in discerning a clear
> design/rationale for is the demuxer. The demuxer as
> implemented is a repository of services (another wooly concept
> afaiks) and something that provides the run, interrupt and
> related functions to control the scheduling/dispatching of
> handlers in reaction to changes in Async_Object state. Is
> there anything preventing a more policy-based approach to
> assembling services?

The demuxer is composed of services in a similar way to how
std::locale is composed of facets. The demuxer is not
necessarily aware of, or coupled to, the service types that it
contains. For example, the SSL implementation uses a service
that is only added in to the demuxer if SSL is used.

The service used by a particular public interface (e.g.
basic_stream_socket) is already a policy template parameter. One
example program shows how basic_stream_socket can be
instantiated with a custom service to perform logging, where
this service is in turn implemented using the standard
stream_socket_service.

There's no reason why, in principle, policy-driven services
could not be added to allow further customisation. But these
would be inherently non-portable, and so would only be part of
some secondary public interface.

> I would like to see the aspects of a socket that have nothing
> to do with reading/writing better separated, as this would
> make for a more consistent interface for other forms of I/O
> such as pipes, message queues etc as well as files.

For stream-oriented I/O, this separation is presented in the
form of the Stream concept. This concept is implemented by
stream_socket, ssl::stream, buffered_read_stream and so on. It
would be implemented by a hypothetical pipe class.

> A very simple active object template that supports read and
> write, but that relies on free functions or helper classes for
> less universal operations such as bind, setting options, etc,
> used as a base class, allows appropriate (and necessary in my
> opinion) use of runtime polymorphism.

I have to disagree on runtime polymorphism: compile-time
polymorphism should always be the default. Runtime polymorphism
can impose additional costs (such as virtual function calls on a
fine-grained interface), and eliminates potential optimisations.
This adds up considerably if there are many layers of runtime
polymorphism involved.

Runtime polymorphism can be layered on top of compile-time
polymorphism using some sort of wrapper. E.g. for the stream
concept:

class abstract_stream { ... };

template <typename Stream>
class stream : public abstract_stream { ... };

stream<ssl::stream<stream_socket> > s(...);
runtime_polymorphic_function(s);

I am not certain if this is something that should be included in
asio or not, since the appropriate place to introduce runtime
polymorphism can vary according to the design of an application.
What do you think? I'd be happy to add it if you think it
generally useful.

<snip>
> The intention was for this general idea to support other types
> of "file descriptor" that don't offer conventional read and
> write interfaces, and to allow other interfaces (such as
> proactive) to use the same demuxing etc infrastructure.
>
> I had hoped to have time to investigate introducing asio as a
> replacement/alternative to this library, but the timing hasn't
> been gode for that. In any case, I don't think asio would be a
> good fit for a system like the following (where the other lib
> is being used):
>
> Components support tcp and ssl for WAN connections and unix
> domain sockets locally. Throw in some weird stuff like
> communicating over special purpose point-to-point links
> implemented as character devices. There are large chunks of
> code that have no need to care what exact type of "socket"
> they are using. Some of this code is actually in dynamically
> linked libraries itself. There are multiple processes using
> these libraries.
>
> Short of placing a wrapper around asio to make it more
> amenable to separate compilation and to introduce runtime
> polymorphism I can't see how I could use asio effectively this
> system, or anything similar.

Would the polymorphic wrapper I proposed above address these
issues?

> As far as I can see asio doesn't even allow interchangeable
> use of an ssl and a tcp stream.

As I mentioned above, asio addresses this using compile time
polymorphism, with the Stream concept being implemented by
stream_socket and ssl::stream. For example, both work equally
well with free functions like asio::async_read.

> I think this is unacceptable in a general purpose library. I'm
> also concerned about the potential code bloat from unnecessary
> duplication of code (not from differnt template parameters,
> but from outright duplication in differnt templates).

Can you please point these out? I'm always in favour of reducing
code bloat.

> Implementation
>
> The code is clean, however I'm very concerned about the
> performance (or lack of) and weight that a combination of the
> implementation and some design decisions have introduced. The
> documentation mentions the memory usage overhead of the
> proactor emulation on systems using select/epoll. However it
> is clear that there is considerable allocation overhead and
> construction overhead involved (and not only on those
> systems). While much discussion has centered on allocator
> performance, I'm concerned that the user/library constructor
> code invoked for each i/o operation is likely to be
> non-trivial as well. For bulk transfers, it is likely that
> these issues won't be too severe, however for systems where
> low-latency handling of small messages is critical, it is
> quite possible that these overheads will be comparable to or
> larger than any other part of the processing for a
> message/transaction.

I don't deny that there may be room for performance improvement
in the implementation. But as I recently tested the custom
allocation with Win32 I/O completion ports, and found that it
performs almost identically to synchronous I/O, I don't think
the public interface imposes any unnecessary inefficiencies.

> I find the way the design supports, and the documentation
> encourages, breaking up handling of structured messages into a
> small header read followed by a larger read (with appropriate
> handler for expected body/chunk) while the implementation
> appears to do some quite heavyweight operations compared to
> just reading a small header to be almost an encouragement of
> an antipattern for efficient I/O.

I don't agree that asio favours this approach over others. But I
do believe that this approach is perfectly valid and extremely
useful, since it permits asynchronous code to be written in
terms of "contracts". I.e. perform this operation (or
operations) and don't come back to me until it's all done or an
error has occurred.

For code where absolute efficiency is not required, this
technique allows a clearer and more elegant expression of
intent. Furthermore, the cost of multiple small read system
calls can be mitigated by leveraging the compile-time
polymorphism of the Stream concept and replacing:

  stream_socket my_socket;

with:

  buffered_read_stream<stream_socket> my_socket;

> I would expect an optimistic reactive approach (read the
> header when notified, determine size, handler etc and
> opertunistically attempt a read of the body) to be
> significantly faster.

Let's take your example of a fixed sized header followed by a
body, and compare and contrast a reactive approach with the
alternatives offered by asio.

---------------------

1. Reactive.

Here you might have a single handler that is informed when a
socket is ready for reading. In this handler you perform
non-blocking reads.

void handle_read(...)
{
  switch (state_)
  {
  case reading_header:

    // Attempt non-blocking read of header.
    size_t bytes_read = read(sock_,
        header_buf_ + header_bytes_so_far_,
        header_length - header_bytes_so_far_);
    header_bytes_so_far_ += bytes_read;

    // Check if we have complete header.
    if (header_bytes_so_far_ < header_length)
      return; // Keep waiting for more.

    // Got the whole header now.
    body_length_ = decode_body_length(header_buf_);
    body_buf_ = ...; // size body buffer appropriately
    body_bytes_so_far_ = 0;
    state_ = reading_body;

    // Fall through to make opportunistic read of body.

  case reading_body:

    // Attempt non-blocking read of body.
    bytes_read = read(sock_,
        body_buf_ + body_bytes_so_far_,
        body_length_ - body_bytes_so_far_);
    body_bytes_so_far_ += bytes_read;

    // Check if we have complete body.
    if (body_bytes_so_far_ < body_length_)
      return; // Keep waiting for more.

    // Got whole body now.
    notify_app(...);

    // Start over.
    state_ = reading_header;
  }
}

2. Asio with multiple reads.

This is the approach where we issue asynchronous reads for
exactly the length of the header, and then the body. It may be
less efficient, but only because of the additional pass through
the demultiplexer, not because it makes more read() system calls
than the above. However feel free to compare in terms of
readability :) The main reason it's clearer is that you no
longer have to deal with short reads -- async_read's contract
takes care of that for you.

void start_read()
{
  // Issue read for entire header.
  async_read(sock_,
      buffer(header_buf_, header_length),
      handle_read_header);
}

void handle_read_header(const error& e)
{
  if (!e)
  {
    // Got header, determine body length.
    body_length_ = decode_body_length(header_buf_);
    body_buf_ = ...; // size body buffer appropriately

    // Issue read for entire body.
    async_read(sock_,
        buffer(body_buf_, body_length_),
        handle_read_body);
  }
}

void handle_read_body(const error& e)
{
  if (!e)
  {
    // Got whole body now.
    notify_app(...);

    // Start over.
    start_read();
  }
}

3. Asio with multiple reads + opportunistic body read.

This extends number 2 by adding in the opportunistic read of the
body. In effect this makes it equivalent to your reactive read
approach, but I think you'll still find it's easier to follow --
again because most of the code to handle short reads is
eliminated.

void start_read()
{
  // Issue read for entire header.
  async_read(sock_,
      buffer(header_buf_, header_length),
      handle_read_header);
}

void handle_read_header(const error& e)
{
  if (!e)
  {
    // Got header, determine body length.
    body_length_ = decode_body_length(header_buf_);
    body_buf_ = ...; // size body buffer appropriately

    // Make opportunistic non-blocking read of body.
    error read_error;
    size_t bytes_read = sock_.read_some(
        buffer(body_buf_, body_length_),
        assign_error(read_error));

    // Did we get whole body?
    if (bytes_read == body_length_ || read_error)
    {
      handle_read_body(read_error);
    }
    else
    {
      // Issue read for rest of body.
      async_read(sock_,
          buffer(body_buf_ + bytes_read,
            body_length_ - bytes_read),
          handle_read_body);
    }
  }
}

void handle_read_body(const error& e)
{
  if (!e)
  {
    // Got whole body now.
    notify_app(...);

    // Start over.
    start_read();
  }
}

4. Asio with bulk reads.

This approach is exemplified by the HTTP server example.
Essentially it reads as much as is available (up to some limit
of course) in one go, and then processes it using some state
machine similar to that in number 1. It is generally the most
efficient since it requires the fewest number of read system
calls or passes through the demuxer.

void handle_read(const error& e, size_t bytes_read)
{
  if (!e)
  {
    process_data(buffer_, bytes_read);
    async_read(sock_,
        buffer(buffer_, buffer_length),
        handle_read);
  }
}

5. Asio with transfer_at_least(...).

In certain applications you may be able to combine the
opportunistic read into the initial asynchronous read operation,
by using the transfer_at_least() completion condition. This
executes the operation with a contract that it transfer a
minimum number of bytes. In this way you can avoid short read
handling for the header, while still maximising the number of
bytes transferred in a single operation.

void start_read()
{
  // Issue read for entire header, and maybe we'll get some
  // body as a bonus.
  boost::array<mutable_buffer, 2> bufs = {
    buffer(header_buf_, header_length),
    buffer(body_buf, max_body_length) };
  async_read(sock_, bufs,
      transfer_at_least(header_length),
      handle_read_header);
}

...

---------------------

> It is implied that an appropriate demuxer/demuxer service can
> be used to implement various multi-threaded processing
> schemes. This may well be true. However, an alternative
> approach is to simply consider the affinity of an Async_Object
> and its demuxer(s) (one for read, one for write) to be dynamic
> and run a number of separate (per thread) demuxers, moving
> Async_Objects between them as required. In this way the
> handlers for a given Async_Object will only run in the
> expected context.

This is not portable. With Win32 I/O completion ports for
example, once a socket is associated with an I/O completion port
it cannot be disassociated. The demuxer design reflects this.

> I imagine other approaches are possible to achieve similar
> results, but it is not clear (to me) from the design
> documentation or a quick look at the code that this is the
> case. I haven't reviewed the code to a level of detail where I
> can be sure I am understanding the dispatching done by the
> epoll and select reactors, but it looks to me as though
> multiple threads may all wait on the same fdset (or
> equivalent)? Is this right? Intended? I had expected some form
> of modified leader/followers pattern here?

No, only one thread in the pool waits on select/epoll/kqueue.
Any other threads in the pool wait on a condition.

> Conclusion
>
> As I mentioned above, had the review been at a different time
> I would have at least attempted to port some code (even if
> only test/benchmark code) from an in-house reactor library to
> asio. As it is my comments are based on a fair bit of
> experience with network and async (in its most general sense)
> I/O from the bit-level (literally) up to the application level
> on a number of OSs/kernels/platforms and libraries (including
> ACE) and I've spent a few hours reading the code and docs. I
> have a healthy distrust of my (and everyone elses) guesses
> regarding performance, and the author and review manager
> should consider my points above regarding performance as areas
> for investigation only.
>
> A fair amount of the above appears to be a request for reactor
> support (which it is, partly) but I would consider mechanisms
> that support very light-weight proactor use highly desirable
> irrespective of reactor support. In particular, it would seem
> that some form of re-use of a single handler and buffer for an
> Async_Object instance without any
> allocation/(copy)construction of anything heavier than a
> pointer should be quite practical.

Asio's interface already supports "lightweight", reusable
handlers, since ultimately a handler is just a function object.
If you wish, you can implement your handlers so that they are no
more expensive than a pointer:

class my_handler
{
public:
  explicit my_handler(my_class* p) : p_(p) {}
  void operator()(const asio::error& e) { p_->handle_event(e); }
private:
  my_class* p_;
};

<snip>

Thanks for taking the time to write such an extensive review.

Cheers,
Chris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk