Boost logo

Boost :

Subject: Re: [boost] Boost.Fiber review January 6-15
From: Nat Goodspeed (nat_at_[hidden])
Date: 2014-01-23 10:05:04


SUMMARY

The review of the proposed Boost.Fiber library ended on January 15,
2014. The verdict is: not in its present form.

The lively discussions during the course of the review indicate
considerable interest in this library, however, and every submitted
review can be read as: perhaps, if certain changes are made. This
produced a long list of suggestions, which constitutes the bulk of
this report.

On behalf of the Boost community, I would like to commend Oliver
Kowalke for the work he has put into the Fiber library so far. I
encourage him to continue to evolve this library and to bring it back
for a mini-review. Rather than considering the subject library as a
whole, a Boost mini-review zooms in on particular issues. That seems
an appropriate tool to use for further evaluation of the Fiber library
-- always subject, of course, to the approval of the Review Wizards.

I received seven formal votes, abbreviated here:

Niall Douglas: YES, IF
Eugene Yakubovich: YES, AND
Antony Polukhin: YES, IF
Vicente J. Botet Escriba: NO, UNTIL
Agustín K-ballo Bergé: NO, IN CURRENT STATE
Hartmut Kaiser: NO, IN CURRENT FORM
Thomas Heller: NO, IN CURRENT FORM

I thank each of these reviewers, and indeed everyone who investigated
the Fiber library and participated in the discussions.

If you feel that I have misrepresented your position, or have omitted
or garbled an important point, please respond: the archived mail
thread should accurately reflect the will of the community, even if
this message does not.

>From the long list of suggestions, a few key themes emerged.

PERFORMANCE

Many respondents requested performance tests (specifics below). There
were a number of suggestions about possible performance pitfalls in
the current implementation, such as use of STL containers (with
consequent heap allocations) and locks for thread safety.

Several people suggested implementing performance tests *before*
starting any such optimizations, which seems like sensible advice.

Niall Douglas pointed out that picking some fixed number of fibers is
less interesting than showing how resource consumption rises with the
number of fibers:

"I think it isn't unreasonable for a library to enter Boost if it has
good performance *scaling* to load (e.g. O(log N)), even if
performance in the absolute or comparative-to-near-alternatives sense
is not great.

"Absolute performance can always be incrementally improved later,
whereas poor performance scaling to load usually means the design is
wrong and you're going to need a whole new library with new API.

"This is why I really wanted to see performance scaling graphs. If
they show O(N log N) or worse, then the design is deeply flawed and
the library must not enter Boost. Until we have such a graph, we can't
know as there is no substitute for empirical testing."

Given the truth of that last observation, I have recast certain
suggested optimizations as requests for measured performance cost.

Performance requests include:

- empty function (create, schedule, execute, delete one fiber)

- same with one yield

- overhead of using a future object

- tests as described in [1]; such tests should allow comparing with
TBB, qthreads, openmp, HPX

- construct and join a single fiber vs. construct and join a single
thread (empty function)

- construct and join several fibers vs. construct and join several
threads (empty function)

- construct and detach a single fiber vs. construct and detach a
single thread (empty function)

- cost of STL containers (std::vector, std::map, std::deque)

- cost of fiber interruption support (same as next bullet?)

- cost of thread safety: intra-thread fibers vs. inter-thread fibers

- cost of round_robin_ws vs. round_robin scheduler

- cost of building on Coroutine vs. building on Context

Once Fiber performance tests are available, I trust the community will
assist Oliver in running them on systems otherwise unavailable to him.

There was some disagreement on whether it is essential for the Fiber
library to attain a certain level of performance before it should be
accepted. Respondents fall into two broad camps.

Some see fibers as lightweight threads (without the overhead of kernel
context switching). They assert that since the API is that of
std::thread, the only reason to accept a separate library would be
astounding performance. They note that on some hardware it is already
plausible to run hundreds of thousands of concurrent std::threads.

Others see fibers as a tool for organizing code based around
asynchronous operations (rather than chains of callbacks). They assert
that for such purposes, eliminating the kernel from context switching
is sufficient performance guarantee. I myself maintain that
cooperative context switching provides important functionality you
cannot reasonably get from std::thread.

The Fiber library attempts to address both use cases. (It was
suggested that if its primary target were code organization, perhaps
it should be renamed something other than "Fiber.")

In any case, the community clearly wants to see Fiber performance
data. Some requested CPU cycles to eliminate clock speed differences,
also memory consumption. ARMv7 data was requested for "extra bonus
points."

DOCUMENTATION

Many respondents requested additional documentation, or
clarifications. Requests include:

- Rationale page explaining what's there, what's not and why. Explain
distinction between Coroutine library and Fiber library. If certain
Fiber functionality is intended to support yet another library (rather
than being complete in itself), call out what would need to be added.

- A section on how to install and run the tests and examples. The need
to embed in a Boost tree is implied but not stated. Mention the need
to build the library and link with it.

- Explain synchronization between fibers on different threads. Must
the code take more care with this than with synchronizing fibers on
the same thread?

- Clarify that an exception raised by a fiber function calls
std::terminate(), as with std::thread, rather than being consumed.

- More clearly explain migrating a fiber from one scheduler to another.

- Document async() in a way compatible with the C++ standard.

- Clarify thread-local effect of set_scheduling_algorithm(). There was
a request to put this function in a this_thread nested namespace to
further clarify.

- Move algorithm class documentation to "Extension" or "Customization"
section. Clarify that it's not part of the baseline library
functionality, but a customization point.

- Document fiber::id.

- Better document promise/future for void and R& (per C++ standard).

- Document thread safety of each support class (or method, if it
varies by method).

- Document complexity guarantees per API.

- Document exception safety per API.

- Document supported architectures (perhaps link to Coroutine
library's list); state minimum compiler versions.

- Document the get/set overloads of thread_affinity() and priority()
separately. Perhaps rename setters to set_thread_affinity() and
set_priority().

- Explain how portable is fiber priority, if it's specific to a
scheduler. What does priority mean when you migrate a fiber from one
thread (one scheduler instance) to another?

- Document the library's ASIO support. Link to Coroutine's ASIO yield
functionality; ensure that ASIO yield is adequately explained. In
particular, distinguish it from this_fiber::yield.

- Better explain (and/or comment) publish-subscribe example, also
other existing examples.

In addition to the documentation requests above, there were requests
for additional examples:

- Simple example of ASIO callback implementation vs. the same logic
using Fiber's ASIO support, a la [4].

- Example of a fiber pool.

- Example of an arbitrary thread B filling a future on which a fiber
in thread A is waiting.

- Example of an arbitrary thread B posting to an asio::io_service
running fibers in thread A.

- Either defend fibers::condition_variable from spurious wakeups in
existing examples, OR document the stronger condition_variable
guarantee.

- Example of M:N threading with ASIO. That might involve either one
io_service per CPU, with fiber migration; or a single io_service with
run() calls from each CPU, grouping fibers for each CPU into strands.

- Example of one thread with many fibers making service requests on a
pool of worker threads performing blocking calls.

- Example of using thread_specific_ptr to manage lifespan of
user-specified scheduler.

- Example of the owner of a fiber changing the fiber's thread affinity
vs. the fiber itself. When would you use each tactic?

- Load example programs into an Examples appendix so that Google
searches can turn up library documentation.

LIBRARY API

- Four people overtly approved the close parallel with std::thread and
its support classes.

- Allocating a default scheduler object, rather than specifying a
default template param, was praised.

- Three people called out the set difference between Boost.Thread
features and Fiber (e.g. future::get_exception_ptr()). One wants these
implemented immediately; another says they can be added later; the
third simply requests that they be documented, with rationale.

- Two people frowned on introducing operator bool methods not found in
std::thread or Boost.Thread.

- C++11 support was mentioned, notably Boost macros such as
BOOST_RV_REF and BOOST_EXPLICIT_OPERATOR_BOOL. Also mentioned were:
C++11 idioms; C++11 std::thread patterns; move construction;
initializer lists; rvalue this overloads; deleting operators.

- The fiber constructor and async() should accept a move-only callable.

- At least for a C++11 compiler, fiber constructor and async() should
accept variadic parameters. These should support move-only types, like
Boost.Thread. C++03 support for variadic parameters would be nice, but
is less important.

- Every API involving time point or duration should accept arbitrary
clock types, immediately converting to a canonical duration type for
internal use.

- Queues should support value_pop() returning item by value. This
supports an item type without a default constructor.

- Nested scoped_lock typedef has been deprecated in thread library.
Remove in Fiber library.

- Align the return type of shared_future::get() with the standard. In
general, ensure that parameter types and return types are aligned with
the standard.

- A couple of people were bothered by the use of types in the detail
namespace as parameters or return values in the algorithm API. (I
note, however, that extending e.g. Boost.Range can involve touching
its detail namespace. A customization point for a library may be a bit
of a gray area.)

- There was a suggestion to rename algorithm to scheduler. In that
case, presumably set_scheduling_algorithm() could be renamed
set_scheduler().

- There was a request to rename round_robin_ws to round_robin_work_stealing.

- A couple of people consider the algorithm API too monolithic,
pointing to redundancies in the round_robin, round_robin_ws and asio
round_robin implementations. They suggested teasing out distinct
classes, so that (for instance) a user-coded scheduler might be able
to override a single method to respect fiber priority. In fact Eugene
Yakubovich offered to experiment with refactoring the algorithm class
this way.

- There was a request for set_scheduling_algorithm() to return the
previous pointer. (It might be useful for the requester to explain the
anticipated use case. An earlier iteration of
set_scheduling_algorithm() did return the previous pointer; Oliver
intentionally changed that behavior.)

- fiber_group got one thumbs-up and two thumbs-down. Options: retain;
improve to use move support rather than fiber*; discard. There is an
opportunity to improve on thread_group; naturally there is risk in
diverging from thread_group.

- Request deferred futures for lazy evaluation.

- There was a suggestion to introduce a global object to coordinate
thread-specific fiber schedulers, in the hope that the global object
could perform all relevant locking and the thread-specific fiber
schedulers could themselves be thread-unsafe.

- There was a request to unify steal_from() and migrate_to() into a
single method. I infer that this is predicated on the previous
suggestion.

- Request future::then() et al, per [5]. (Someone please clarify the
present status of N3784?)

- Request enriched barrier support per [6] and [7]. (Someone please
clarify the present status of N3817?)

- There are two fiber properties specific to particular schedulers:
thread_affinity (used only by round_robin_ws) and priority (as yet
unused by any scheduler). What if a user-coded scheduler requires a
fiber property that does not yet exist? Is there a general approach
that could subsume the present support for thread_affinity and
priority, in fiber and this_fiber? Could the initial values for such
properties be passed as part of the fiber constructor's attributes
parameter?

- One use case was surfaced that may engage the previous bullet: the
desire to associate a given fiber with any of a group of threads, such
as the set of threads local to a NUMA domain or physical CPU.

IMPLEMENTATION

- Replace std::auto_ptr with boost::scoped_ptr. The former produces
deprecation warnings on GCC.

- Reduce redundancy between try_lock() and lock().

- boost::fibers::asio::detail::yield_handler::operator()() calls
algorithm::spawn() before algorithm::run(). Does this allow the
scheduler to choose the next fiber to run, e.g. a user-coded scheduler
that respects fiber priority?

- Add memory transaction support to spinlock a la [8].

- Intel TSX lock avoidance would be nice.

LINKS

[1] https://github.com/STEllAR-GROUP/hpx/tree/master/tests/performance
[2] http://stellar.cct.lsu.edu/pubs/isc2012.pdf
[3] http://stellar.cct.lsu.edu/pubs/scala13.pdf
[4] https://ci.nedprod.com/job/Boost.AFIO%20Build%20Documentation/Boost.AFIO_Documentation/doc/html/afio/quickstart/async_file_io/hello_world.html
[5] http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3784.pdf
[6] http://www.boost.org/doc/libs/1_55_0/doc/html/thread/synchronization.html#thread.synchronization.barriers
[7] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3817.html#barrier_operations
[8] https://github.com/BoostGSoC/boost.afio/blob/master/boost/afio/detail/

Nat Goodspeed
Boost.Fiber Review Manager
________________________________


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk