Boost logo

Boost :

From: Christopher Kohlhoff (chris_at_[hidden])
Date: 2005-12-21 01:10:01


Hi Rene,

In my first reply to your review I outlined a proposal for
custom memory allocation. I have now implemented enough of it to
re-run your test program (updated program and results attached).

--- Rene Rivera <grafik.list_at_[hidden]> wrote:
> I ran the same 100,000*1K*6*2*3 tests with both debug and
> release compiled code. As can be seen from the attached output
> in the best case, of release code, there is a 5.6% "overhead"
> from the async to sync cases.

Machine specs: 1.7GHz Pentium M, 512MB RAM, Windows XP SP2.
Compiler: VC8 Express.

First, without custom memory allocation a typical run of the
program made approximately 340000 calls to new. With my custom
memory allocation there were just 80.

In the release build async calls were marginally slower, but the
difference was less than 1%.

> For the debug code the difference is a more dramatic 25.2%.

Even though a debug build is not really relevant for performance
tests, I suspect this difference is mostly due to the number of
layers introduced by boost::bind. In my updated test, which uses
a hand-crafted function object, the difference was more like 10%
when using dynamic allocation. It's just 5% with custom
allocation.

> In my continue reading and use of the code I concluded that
> the basic design flaw is that the underlying *socket_service
> classes are not available as a public interface.

I don't understand how not exposing the implementation details
is a design flaw.

> Another restriction that this causes is that the various OS
> handling models are not available as individual handling
> types. Hence, for example on Linux, it is not possible to
> choose at runtime whether one should use epoll, kqueue, or
> select.

The stated intention is to use the most scalable event
demultiplexer offered by a platform. Furthermore, runtime
polymorphism is likely to limit the opportunity for
optimisation. E.g. the custom allocation strategy I have
implemented works because the user handler types have not been
erased.

I'd be interested to know whether you, or others, find this
custom memory allocation interface satisfactory. After having
used it I think that I quite like this approach because it
allows the developer to use application-specific knowledge about
the number of concurrent asynchronous "chains" when customising
memory allocation.

This custom memory allocation implementation required no changes
to the existing asio public interface or overall design.

Cheers,
Chris

#include <cstdio>
#include <iostream>
#include <vector>
#include <valarray>
#include <boost/shared_ptr.hpp>
#include <boost/thread.hpp>
#include <boost/bind.hpp>
#include <asio.hpp>

long new_count = 0;

void* operator new(std::size_t size)
{
  InterlockedIncrement(&new_count);
  void* p = malloc(size);
  if (!p)
    throw std::bad_alloc();
  return p;
}

void operator delete(void* p)
{
  free(p);
}

const std::size_t message_size = /**/ 1*1024 /*/ 64*1024 /**/;
const int port = 9999;
const std::size_t message_iterations = /** 100 /*/ 100000 /**/;

//namespace asio = boost::asio;

namespace detail_test
{
        struct timed_scope
        {
                boost::xtime t0;
                boost::xtime t1;
                std::size_t n;
                
                std::vector<double> & results;
                
                inline timed_scope(std::vector<double> & r, std::size_t iterations = 1)
                        : results(r), n(iterations)
                {
                        boost::xtime_get(&t0,boost::TIME_UTC);
                }
                
                inline ~timed_scope()
                {
                        boost::xtime_get(&t1,boost::TIME_UTC);
                        double t = double(t1.sec)+double(t1.nsec)/double(1000000000);
                        t -= double(t0.sec)+double(t0.nsec)/double(1000000000);
                        std::cerr << "### TIME"
                                << ": total = " << t
                                << "; iterations = " << n
                                << "; iteration = " << t/double(n)
                                << "; iterations/second = " << double(n)/t
                                << '\n';
                        results.push_back(double(n)/t);
                }
        };
        
        template <typename out>
        out & result_summary(out & o, const std::vector<double> & results)
        {
                std::valarray<double> r(&results[1],results.size()-2);
                o << r.sum()/r.size();
                return o;
        }
        
        void sleep_for_secs(int n)
        {
                boost::xtime t;
                boost::xtime_get(&t,boost::TIME_UTC);
                t.sec += n;
                boost::thread::sleep(t);
        }
}

struct async_server;

struct async_server_receive_handler
{
  async_server_receive_handler(async_server* this_p) : this_p_(this_p) {}
  void operator()(const asio::error& error, std::size_t);
  async_server* this_p_;
};

template <>
class asio::handler_alloc_hook<async_server_receive_handler>
{
public:
  template <typename Allocator>
  static typename Allocator::pointer allocate(
      async_server_receive_handler& h,
      Allocator& allocator,
      typename Allocator::size_type count)
  {
    return reinterpret_cast<typename Allocator::pointer>(
        h.this_p_->operation_buffer);
  }

  template <typename Allocator>
  static void deallocate(
      async_server_receive_handler& h,
      Allocator& allocator,
      typename Allocator::pointer pointer,
      typename Allocator::size_type count)
  {
  }
};

struct async_server
{
        char io_buffer[message_size];
        asio::demuxer demuxer;
        asio::datagram_socket socket;
        std::auto_ptr<detail_test::timed_scope> timer;
        std::size_t message_count;
        std::size_t message_recount;
        std::auto_ptr<boost::thread> runner;
        std::vector<double> results;
  char operation_buffer[1024];
        
        async_server()
                : socket(this->demuxer,asio::ipv4::udp::endpoint(port))
                , message_count(0)
                , message_recount(0)
        {
                socket.async_receive(
                        asio::buffer(io_buffer, message_size),
                        0,
      async_server_receive_handler(this));
        }

        void start()
        {
                this->runner.reset(new boost::thread(
                        boost::bind(&asio::demuxer::run,&this->demuxer) ));
        }
        
        void stop()
        {
                this->demuxer.interrupt();
                this->runner->join();
                this->clear_timer();
        }
        
        void reset_timer(std::size_t i = 1)
        {
                this->message_recount = i;
                this->timer.reset(new detail_test::timed_scope(this->results,i));
        }
        
        void clear_timer()
        {
                this->timer.reset();
        }
};

void async_server_receive_handler::operator()(
    const asio::error& error, std::size_t)
{
  if (!error || error == asio::error::message_size)
  {
    if (++this_p_->message_count == this_p_->timer->n)
    {
      this_p_->clear_timer();
      this_p_->message_count = 0;
      this_p_->reset_timer(this_p_->message_recount);
    }
    this_p_->socket.async_receive(
      asio::buffer(this_p_->io_buffer, message_size),
      0, *this);
  }
}

struct sync_server
{
        char io_buffer[message_size];
        asio::demuxer demuxer;
        asio::datagram_socket socket;
        std::auto_ptr<detail_test::timed_scope> timer;
        std::size_t message_count;
        std::size_t message_recount;
        std::auto_ptr<boost::thread> runner;
        std::vector<double> results;
        volatile bool running;
        
        sync_server()
                : socket(this->demuxer,asio::ipv4::udp::endpoint(port))
                , message_count(0)
                , message_recount(0)
                , running(false)
        {
        }
        
        void handle_receive_from(
                const asio::error& error,
                size_t /*bytes_transferred*/)
        {
                if (!error || error == asio::error::message_size)
                {
                        if (++message_count == this->timer->n)
                        {
                                this->clear_timer();
                                message_count = 0;
                                this->reset_timer(message_recount);
                        }
                }
        }
        
        void start()
        {
                this->runner.reset(new boost::thread(
                        boost::bind(&sync_server::run,this) ));
        }
        
        void run()
        {
                this->running = true;
                while (this->running)
                {
                        asio::error error;
                        std::size_t bytes_transferred =
                                socket.receive(
                                        asio::buffer(io_buffer, message_size),
                                        0,
                                        asio::assign_error(error));
                        this->handle_receive_from(error,bytes_transferred);
                        if (error && error != asio::error::message_size) break;
                }
                this->running = false;
        }
        
        void stop()
        {
                this->running = false;
                this->socket.close();
                this->runner->join();
                this->clear_timer();
        }
        
        void reset_timer(std::size_t i = 1)
        {
                this->message_recount = i;
                this->timer.reset(new detail_test::timed_scope(this->results,i));
        }
        
        void clear_timer()
        {
                this->timer.reset();
        }
};

struct sync_client
{
        char io_buffer[message_size];
        asio::demuxer demuxer;
        asio::ipv4::host_resolver host_resolver;
        asio::ipv4::host host;
        asio::ipv4::udp::endpoint receiver_endpoint;
        asio::datagram_socket socket;
        
        sync_client()
                : host_resolver(this->demuxer)
                , receiver_endpoint(port)
                , socket(this->demuxer,asio::ipv4::udp::endpoint(0))
        {
                host_resolver.get_host_by_name(host, "127.0.0.1");
                receiver_endpoint.address(host.address(0));
        }
        
        void send()
        {
                socket.send_to(
                        asio::buffer(io_buffer, message_size),
                        0, receiver_endpoint);
        }
};

int main()
{
        sync_client c0;
        {
                async_server s0;
                s0.start();
                detail_test::sleep_for_secs(2);
                std::cerr << "--- ASYNC...\n";
                s0.reset_timer(message_iterations);
                for (std::size_t m = 0; m < message_iterations*(1+4+1); ++m)
                {
                        c0.send();
                }
                s0.stop();
                detail_test::result_summary(
                        std::cerr << "-- ...ASYNC: average iterations/second = ",
                        s0.results) << "\n";
        }
        detail_test::sleep_for_secs(5);
        {
                sync_server s0;
                s0.start();
                detail_test::sleep_for_secs(2);
                std::cerr << "--- SYNC...\n";
                s0.reset_timer(message_iterations);
                for (std::size_t m = 0; m < message_iterations*(1+4+1); ++m)
                {
                        c0.send();
                }
                s0.stop();
                detail_test::result_summary(
                        std::cerr << "-- ...SYNC: average iterations/second = ",
                        s0.results) << "\n";
        }
  std::cerr << new_count << " calls to new\n";
        return 1;
}


--- ASYNC...
### TIME: total = 1.75252; iterations = 100000; iteration = 1.75252e-005; iterations/second = 57060.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.7
### TIME: total = 1.6023; iterations = 100000; iteration = 1.6023e-005; iterations/second = 62410.1
-- ...ASYNC: average iterations/second = 63100.8
--- SYNC...
### TIME: total = 1.81261; iterations = 100000; iteration = 1.81261e-005; iterations/second = 55169.2
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.56225; iterations = 100000; iteration = 1.56225e-005; iterations/second = 64010.4
-- ...SYNC: average iterations/second = 63302
--- ASYNC...
### TIME: total = 1.81261; iterations = 100000; iteration = 1.81261e-005; iterations/second = 55169.2
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.7
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.55223; iterations = 100000; iteration = 1.55223e-005; iterations/second = 64423.4
-- ...ASYNC: average iterations/second = 62802.6
--- SYNC...
### TIME: total = 1.78256; iterations = 100000; iteration = 1.78256e-005; iterations/second = 56099
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
-- ...SYNC: average iterations/second = 63201.4
--- ASYNC...
### TIME: total = 1.74251; iterations = 100000; iteration = 1.74251e-005; iterations/second = 57388.6
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.7
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.63235; iterations = 100000; iteration = 1.63235e-005; iterations/second = 61261.5
-- ...ASYNC: average iterations/second = 63201.4
--- SYNC...
### TIME: total = 1.64236; iterations = 100000; iteration = 1.64236e-005; iterations/second = 60887.9
### TIME: total = 1.55223; iterations = 100000; iteration = 1.55223e-005; iterations/second = 64423.4
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
-- ...SYNC: average iterations/second = 64737.1
--- ASYNC...
### TIME: total = 1.68242; iterations = 100000; iteration = 1.68242e-005; iterations/second = 59438.2
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.68242; iterations = 100000; iteration = 1.68242e-005; iterations/second = 59438.2
-- ...ASYNC: average iterations/second = 63200.1
--- SYNC...
### TIME: total = 1.75252; iterations = 100000; iteration = 1.75252e-005; iterations/second = 57060.7
### TIME: total = 1.55223; iterations = 100000; iteration = 1.55223e-005; iterations/second = 64423.4
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
### TIME: total = 1.55223; iterations = 100000; iteration = 1.55223e-005; iterations/second = 64423.4
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
### TIME: total = 1.52219; iterations = 100000; iteration = 1.52219e-005; iterations/second = 65694.9
-- ...SYNC: average iterations/second = 64632.5
--- ASYNC...
### TIME: total = 1.73249; iterations = 100000; iteration = 1.73249e-005; iterations/second = 57720.3
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.63235; iterations = 100000; iteration = 1.63235e-005; iterations/second = 61261.5
-- ...ASYNC: average iterations/second = 63401.4
--- SYNC...
### TIME: total = 1.74251; iterations = 100000; iteration = 1.74251e-005; iterations/second = 57388.6
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.47212; iterations = 100000; iteration = 1.47212e-005; iterations/second = 67929.4
-- ...SYNC: average iterations/second = 63401.4
--- ASYNC...
### TIME: total = 1.76253; iterations = 100000; iteration = 1.76253e-005; iterations/second = 56736.5
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.64236; iterations = 100000; iteration = 1.64236e-005; iterations/second = 60887.9
-- ...ASYNC: average iterations/second = 63001.4
--- SYNC...
### TIME: total = 1.80259; iterations = 100000; iteration = 1.80259e-005; iterations/second = 55475.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
-- ...SYNC: average iterations/second = 63200.1
--- ASYNC...
### TIME: total = 1.70245; iterations = 100000; iteration = 1.70245e-005; iterations/second = 58738.9
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.67241; iterations = 100000; iteration = 1.67241e-005; iterations/second = 59794.1
-- ...ASYNC: average iterations/second = 63100.8
--- SYNC...
### TIME: total = 1.81261; iterations = 100000; iteration = 1.81261e-005; iterations/second = 55169.2
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.52219; iterations = 100000; iteration = 1.52219e-005; iterations/second = 65694.9
-- ...SYNC: average iterations/second = 63200.1
--- ASYNC...
### TIME: total = 1.79258; iterations = 100000; iteration = 1.79258e-005; iterations/second = 55785.6
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
-- ...ASYNC: average iterations/second = 63300.8
--- SYNC...
### TIME: total = 1.79258; iterations = 100000; iteration = 1.79258e-005; iterations/second = 55785.6
### TIME: total = 1.55223; iterations = 100000; iteration = 1.55223e-005; iterations/second = 64423.4
### TIME: total = 1.55223; iterations = 100000; iteration = 1.55223e-005; iterations/second = 64423.4
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
### TIME: total = 1.54222; iterations = 100000; iteration = 1.54222e-005; iterations/second = 64841.7
### TIME: total = 1.49215; iterations = 100000; iteration = 1.49215e-005; iterations/second = 67017.6
-- ...SYNC: average iterations/second = 64632.5
--- ASYNC...
### TIME: total = 1.77255; iterations = 100000; iteration = 1.77255e-005; iterations/second = 56415.9
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.6023; iterations = 100000; iteration = 1.6023e-005; iterations/second = 62410.1
### TIME: total = 1.6724; iterations = 100000; iteration = 1.6724e-005; iterations/second = 59794.1
-- ...ASYNC: average iterations/second = 63002.6
--- SYNC...
### TIME: total = 1.78256; iterations = 100000; iteration = 1.78256e-005; iterations/second = 56099
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.56225; iterations = 100000; iteration = 1.56225e-005; iterations/second = 64010.4
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
-- ...SYNC: average iterations/second = 63604
--- ASYNC...
### TIME: total = 1.77255; iterations = 100000; iteration = 1.77255e-005; iterations/second = 56415.9
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
### TIME: total = 1.64236; iterations = 100000; iteration = 1.64236e-005; iterations/second = 60887.9
-- ...ASYNC: average iterations/second = 63300.8
--- SYNC...
### TIME: total = 1.69243; iterations = 100000; iteration = 1.69243e-005; iterations/second = 59086.5
### TIME: total = 1.59229; iterations = 100000; iteration = 1.59229e-005; iterations/second = 62802.6
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.58228; iterations = 100000; iteration = 1.58228e-005; iterations/second = 63200.1
### TIME: total = 1.57226; iterations = 100000; iteration = 1.57226e-005; iterations/second = 63602.7
-- ...SYNC: average iterations/second = 63100.8


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk