[Boost-bugs] [Boost C++ Libraries] #11895: Strand service scheduling is hurting ASIO scalability

Subject: [Boost-bugs] [Boost C++ Libraries] #11895: Strand service scheduling is hurting ASIO scalability
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2016-01-08 04:38:27

#11895: Strand service scheduling is hurting ASIO scalability
 Reporter: Chris White <chriswhitemsu@…> | Owner: chris_kohlhoff
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: asio
  Version: Boost 1.60.0 | Severity: Optimization
 Keywords: strand scheduling priority |
 This problem will be explained best by walking through a scenario:

 I have an io_service being run with one thread per CPU core. The CPU has 4
 cores. I also have a strand. I then post 10 one-second operations to the
 strand, and 30 one-second operations to the io_service. That's 40
 cumulative seconds of work, and since there are 4 cores, that work should
 optimally take 10 seconds when performed in parallel.

 However, that's not what happens. The work in the io_service is given
 precedence over the work in the strand. So first, the 30 seconds of
 cumulative work in the io_service is performed in parallel which takes
 roughly 7.5 seconds (30 seconds / 4 cores). Only after that work is
 complete does the 10 seconds of work on the strand get performed. So it
 ends up taking a total of 17.5 seconds to do all the work because CPU time
 wasn't distributed optimally.

 If we think of the strand and the io_service as queues, what we have here
 is a queue within a queue. The outer queue's work can be performed in
 parallel, the inner queue's work is performed serially. It seems to make a
 whole lot of sense that when control reaches the inner queue, that queue
 should be serviced until it's empty. After all, there are other threads
 that can still service the outer queue while that's happening. The
 opposite is not true. By giving the outer queue priority, ASIO is
 crippling the concurrency potential of the work in the inner queue i.e.
 the strand.

 This simplified scenario is a very real performance problem for my
 companies server application. As far as I can tell, the only options for
 me to address the problem are to implement my own strand that doesn't give
 control back to the io_service until it's work queue is empty -or- put all
 of my strands onto one io_service and post all non-strand work to a
 second, separate io_service.

 I attached a sample application that demonstrates the scenario above,
 complete with timings and a visual print out of the order of operations.
 It also includes a stripped down strand implementation that demonstrates
 how the problem could be addressed. The application was built in Visual
 Studio 2013 with boost version 1.60.0.

Ticket URL: <https://svn.boost.org/trac/boost/ticket/11895>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:19 UTC