Boost logo

Boost :

Subject: Re: [boost] [afio] Formal review of Boost.AFIO
From: Agustín K-ballo Bergé (kaballo86_at_[hidden])
Date: 2015-08-30 14:05:25

On 8/30/2015 1:01 PM, Niall Douglas wrote:
> I appreciate that from your perspective, it's a question of good
> design principles, and splashing shared_ptr all over the place is not
> considered good design. For the record, I*agree* where the overhead
> of a shared_ptr*could* be important - an*excellent* example of that
> case is std::future<T> which it is just plain stupid that those use
> memory allocation at all, and I have a non memory allocating
> implementation which proves it in Boost.Outcome. But for AFIO, where
> the cost of a shared_ptr will always be utterly irrelevant compared
> to the operation cost, this isn't an issue.

Let's get this memory allocation concern out of the way. One just can't
have a conforming implementation of `std::future` that does not allocate
memory. Assume that you could, by embedding the storage for the result
(value-or-exception) inside either of the `future/promise`:

1) Allocator support: `std::future::share` transfer ownership of the
(unique) future into a shared future, and thus necessarily requires
allocation [see below]. This allocation ought to be done with the
allocator/pmr supplied to the `std::promise` constructor. You then have
a few options:

a) Keeping this allocator around so that `std::future::share` can use
it, this is the standard conforming option. This means type-erasure in
some way or another, which accounts to doing memory allocation when the
size of the allocator is greater than the size of some hard-coded small
buffer (properly aligned, etc).

b) You can ditch allocator support, which is an option to the majority
of the population but the standard, and resort to `std::allocator`. You
now have a problem, because you have no control over the allocation
process, and thus you cannot mitigate the cost of it by using pools,
stack buffers, etc. However `std::shared_future` usage should be rare,
so this might not be that big of a deal.

c) You can try to change the standard so that it is `std::future::share`
that takes an allocator, and guarantee no memory allocation anywhere
else. This would be a reasonable approach under certain conditions.

The reason `std::shared_future` cannot make use of embedded storage,
thus necessarily requiring allocation, has to do with lifetime and
thread-safety. `std::shared_future::get` returns a reference to the
resulting value, which is guaranteed to be valid for as long as there is
at least one instance of `std::shared_future` around. If embedded
storage were to be used, it would imply moving the location of the
resulting value when the instance holding it goes away. This can happen
in a separate thread, as `std::shared_future` and
`std::shared_future::get` are thread-safe. All in all it would lead to
the following scenario:

     std::shared_future<T> s = get_a_shared_future_somehow();
     T const& r = s.get();
     std::cout << r; // potentially UB, potentially race

2) Type requirements: The standard places very few restrictions on which
types can be used with asynchronous results, those are (besides
   - `std::future<T>::get` requires `T` be move constructible,
   - `std::promise<T>::set_value` requires `T` be
copy/move-constructible (some individuals are considering proposing
`emplace_value`, which would lift this restriction),
   - `std::shared_future<T>::get` requires nothing.

The use of embedded storage increases those restrictions:

a) `T` has to be move-constructible, which is fine today as it is
already required implicitly by `std::promise`, `std::packaged_task`,
etc. I'm only mentioning this as there's interesting in dropping this
requirement, to increase consistency with regard to emplace construction

b) `T` has to be nothrow-move-constructible, as moving any of
`std::promise`, `std::future`, `std::shared_future` is `noexcept`.

c) If synchronization is required when moving the result from one
embedded-storage to the other, `T` has to be
trivially-move-constructible, as executing user code could potentially
lead to a deadlock. This might be tractable by using atomics, the atomic
experts would know (I am not one of them). This could also be addressed
by transactional memory, but this would only further increase the
restrictions on types that could be used (although I am not a TM expert

So far we know for sure that a standard-conforming non-allocating
`std::promise<T>/future<T>` pair can be implemented as long as:
- `T` is trivially-move-constructible
- The allocator used to construct the promise is `std::allocator`, or
that it is a viable candidate for small buffer optimization.

Such an implementation would use embedded storage under those
partly-runtime conditions, which is quite a restricted population but
still promising as it covers the basic `std::future<int>` scenario. But
as it usually happens, it is a tradeoff, as such an implementation would
have to incur synchronization overhead every time either of the
`std::future/promise` is moved for the case where the `std::future` is
retrieved before the value is ready, which in my experience comprises
the majority of the use cases.

But for completeness, let's analyze the possible scenarios. It always
starts with a `std::promise`, which is the one responsible for creating
the shared-state. Then either of these could happen:

I) The shared-state is made ready by providing the value-or-exception to
the `std::promise`.

II) The `std::future` is retrieved from the `std::promise`.

In the case where (I) happens before (II), no extra synchronization is
needed, since the `std::promise` can simply transfer the result to the
`std::future` during (II). Once the result has been provided, there is
no further communication between `std::promise` and `std::future`. This
represents the following scenario:

     std::promise<int> p;
     std::future<int> f = p.get_future();

which is nothing but a long-winded overhead-riddled way of saying:

     std::future<int> f = std::make_ready_future(42);

In the case where (II) happens before (I), every time either one of the
`std::future` or `std::promise` moves it has to notify the other one
that it has gone to a different location, should it require to contact
it. Again, this would happen for as long as the shared-state is not made
ready, and represents the following scenario:

     std::promise<int> p;
     std::future<int> f = p.get_future();
     std::thread t([p = std::move(p)] { p.set_value(42); });

which is the poster-child example of using `std::promise/future`.

Finally, in Lenexa the SG1 decided to accept as a defect LWG2412, which
allows for (I) and (II) to happen concurrently (previously undefined
behavior). This appears to not have yet moved forward by LWG yet. It
represents the following scenario:

     std::promise<int> p;
     std::thread t([&] { p.set_value(42); });
     std::future<int> f = p.get_future();

which is in reality no different than the previous scenario, but which
an embedded storage `std::promise` implementation needs to address with
more synchronization.

Why is this synchronization worth mention at all? Because it hurts
concurrency. Unless you are in complete control of every piece of code
that touches them and devise it so that no moves happen, you are going
to see the effects of threads accessing memory of other threads with all
what it implies. But today's `std::future` and `std::promise` are
assumed to be cheaply movable (just a pointer swap). You could try to
protect from it by making `std::future` and `std::promise` as long as a
cache line, and even by simply using dynamic memory allocation for them
together with an appropriate allocator specifically designed to aid
whatever use case you could have where allocation time is a constraint.

And finally, let's not forget that the Concurrency TS (or actually the
futures continuation section of it) complicates matters even more. The
addition of `.then` requires implementations to store an arbitrary
Callable around until the future to which it was attached becomes ready.
Arguably, this Callable has to be stored regardless of whether the
future is already ready, but I'm checking the final wording and it
appears that you can as-if run the continuation in the calling thread
despite not being required (and at least discouraged in an initial
phase). Similar to the earlier allocator case, this Callable can be
whatever so it involves type-erasure in some way or another, which will
require memory allocation whenever it doesn't fit within a dedicated
small buffer object.

To sum things up (based on my experience and that of others which I had
a chance to discuss the subject), a non-allocating quasi-conformant
`std::future/promise` implementation would cater only to a very limited
set of types in highly constrained scenarios where synchronization
overhead is not a concern. In real word scenarios, and specially those
that rely heavily on futures due to the use of continuations, time is
better spent by focusing in memory allocation schemes (the actual real
concern after all) by using the standard mechanism devised to tend to
exactly those needs: allocators.

I'll be interested in hearing your findings during your work on the
subject. And would you want me to have a look at your implementation and
come up with ways to "break it" (which is what I do best), you have just
to contact me.


Agustín K-ballo Bergé.-

Boost list run by bdawes at, gregod at, cpdaniel at, john at