Boost logo

Boost :

From: Christopher Kohlhoff (chris_at_[hidden])
Date: 2005-12-27 19:23:24


--- Caleb Epstein <caleb.epstein_at_[hidden]> wrote:
> On Linux, the differences between async and sync results are
> much more striking.

I have spent some time investigating this today, in the course
of which I implemented optimisations to eliminate memory
allocations and reduce the number of system calls in the
asynchronous case. Even with the optimisations, on Linux the
test still showed approximately the same results as those
reported by Caleb.

However, changing the test to have a network between the sender
and receiver shows a marked improvement in async's relative and
absolute performance. In this case, the performance of sync and
async is virtually the same.

My conclusion is that the single-host test exhibits pathological
behaviour on Linux (and possibly other OSes). The problem arises
due to UDP being an unreliable protocol.

Let's consider the behaviour of the async test. We have:

- One thread performing synchronous sends in a tight loop.

- One thread performing asynchronous receives via the demuxer.

Typically a UDP send will not block, so the synchronous loop
performs sends until its timeslice finishes. This will rapidly
fill the buffer on the receiving socket, and once that buffer is
full the additional datagrams are discarded.

The receiver will continue to receive whatever datagrams are
available without giving up its timeslice, but once those are
gone it will block on select/epoll/etc. The net result is that
it takes the receiver more timeslices, and therefore more time,
to receive its quota of packets.

The synchronous test, on the other hand, appears to be getting
flow control for free from Linux. That is, a thread blocking on
a synchronous receive seems to be woken up as soon as data is
available, so the socket's buffer never fills.

This is borne out by introducing simple flow control to the
async test. I added a short sleep to the synchronous send loop
like so:

    if (m % 128 == 0)
    {
      timeval tv;
      tv.tv_sec = 0;
      tv.tv_usec = 1000;
      select(0, 0, 0, 0, &tv);
    }

and the performance of the async test was boosted to
approximately 2/3 of the sync test.

A more realistic test involves putting the sender and receiver
on different hosts. I did this with the following setup:

- Dedicated 100Mbps ethernet connection
- Sender: Windows XP SP2, 1.7GHz Pentium M, 512MB RAM
- Receiver: Linux 2.6.8 kernel, 900MHz Pentium 3, 256MB RAM

Running the test with packets of 256, 512 and 1024 bytes showed
identical performance for the async and sync cases.

I'm not saying that async operations will always perform as well
as the equivalent sync operations. A one-socket test like this
naturally favours synchronous operations, because an
asynchronous implementation involves additional demultiplexing
costs. However in a use case involving multiple sockets, these
costs are amortised.

Cheers,
Chris

--- Caleb Epstein <caleb.epstein_at_[hidden]> wrote:

> On 12/20/05, Rene Rivera <grafik.list_at_[hidden]> wrote:
> >
> >
> > I ran the same 100,000*1K*6*2*3 tests with both debug and release
> > compiled code. As can be seen from the attached output in the best
> case,
> > of release code, there is a 5.6% "overhead" from the async to sync
> > cases. For the debug code the difference is a more dramatic 25.2%.
>
>
> On Linux, the differences between async and sync results are much
> more
> striking. Here are the results from Rene's program compiled with gcc
> 4.0.2-O2 on Linux
> 2.6 (epoll). I had to make a number of small changes to get it to
> compile,
> and the SYNC test hangs at the end
>
> --- ASYNC...
> ### TIME: total = 4.62879; iterations = 100000; iteration =
> 4.62879e-05;
> iterations/second = 21603.9
> ### TIME: total = 5.37136; iterations = 100000; iteration =
> 5.37136e-05;
> iterations/second = 18617.3
> ### TIME: total = 5.03588; iterations = 100000; iteration =
> 5.03588e-05;
> iterations/second = 19857.5
> ### TIME: total = 5.09588; iterations = 100000; iteration =
> 5.09588e-05;
> iterations/second = 19623.7
> ### TIME: total = 4.60645; iterations = 100000; iteration =
> 4.60645e-05;
> iterations/second = 21708.7
> ### TIME: total = 4.55167; iterations = 100000; iteration =
> 4.55167e-05;
> iterations/second = 21970
> -- ...ASYNC: average iterations/second = 19951.8
> --- SYNC...
> ### TIME: total = 1.38579; iterations = 100000; iteration =
> 1.38579e-05;
> iterations/second = 72161.2
> ### TIME: total = 1.3561; iterations = 100000; iteration =
> 1.3561e-05;
> iterations/second = 73741
> ### TIME: total = 1.34804; iterations = 100000; iteration =
> 1.34804e-05;
> iterations/second = 74181.9
> ### TIME: total = 1.35522; iterations = 100000; iteration =
> 1.35522e-05;
> iterations/second = 73788.5
> ### TIME: total = 1.36956; iterations = 100000; iteration =
> 1.36956e-05;
> iterations/second = 73016.4
> ### TIME: total = 22.2436; iterations = 100000; iteration =
> 0.000222436;
> iterations/second = 4495.68
> -- ...SYNC: average iterations/second = 73682
>
> I had to interrupt the program by attaching a debugger to get it to
> run to
> completion (explaining the low result for the last SYNC loop). One
> thread
> seems to get stuck in a "recv" call (sync_server::run) that does not
> get
> interrupted by main's call to s0.stop(). Not sure if this is a bug
> in the
> test program or in asio.
>
> --
> Caleb Epstein
> caleb dot epstein at gmail dot com
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk