Boost logo

Boost :

Subject: Re: [boost] Futures (was: Re: [compute] Some remarks)
From: Thomas Heller (thom.heller_at_[hidden])
Date: 2015-01-05 07:49:08


On Monday, January 05, 2015 10:27:33 Niall Douglas wrote:
> On 4 Jan 2015 at 11:25, Thomas Heller wrote:
> > I absolutely agree. "future islands" are a big problem which need a
> > solution very soon. To some extent the shared state as described in the
> > standard could be the interface to be used by the different islands. What
> > we miss here is a properly defined interface etc.. I probably didn't make
> > that clear enough in my initial mail, but i think this unifying future
> > interface should be the way forward so that different domains can use
> > this to implement their islands. FWIW, we already have that in HPX and we
> > are currently integrating OpenCL events within our "future island", this
> > works exceptionally well.
>
> I personally think that any notion of any shared state in futures is
> one of the big design mistakes. Instead of "future as a shared_ptr",
> think "future as a pipe".

std::future is more like a unique_ptr, std::shared_future is the shared_ptr
equivalent. In the "future as a pipe" idea, the future is merely the receiving
end.

>
> > > 2. Every time you touch them with change you unavoidably spend
> > > thousands of CPU cycles due to going through the memory allocator and
> > > (effectively) the internal shared_ptr. This makes using futures for a
> > > single SHA round, for example, a poor design despite how nice and
> > > clean it is.
> >
> > I am not sure i fully understand that statement. All I read is that a
> > particular implementation seems to be bad and you project this to the
> > general design decision. I would like to see this SHA future code though
> > and experiment with it a bit.
>
> Have a look at
> https://github.com/BoostGSoC13/boost.afio/blob/content_hashing_merge/b
> oost/afio/hash_engine.hpp.
>
> The best I could get it to is 17 cycles a byte, with the scheduling
> (mostly future setup and teardown) consuming 2 cycles a byte, or a
> 13% overhead which I feel is unacceptable.
>
> The forthcoming hardware offloaded SHA in ARM and Intel CPUs might do
> 2 cycles a byte. In this situation the use of futures halves
> performance which is completely unacceptable.
>
> > > 3. They force you to deal with exceptions even where that is not
> > > appropriate, and internally most implementations will do one or more
> > > internal throw-catches which if the exception type has a vtable, can
> > > be particularly slow.
> >
> > I think this is a void statement. You always have to deal with exceptions
> > in one way or another ... But yes, exception handling is slow, so what?
> > It's only happening in exceptional circumstances, what's the problem
> > here?
>
> No it isn't. Current futures require the compiler to generate the
> code for handling exception throws irrespective of whether it could
> ever happen or not. As a relative weight to something like a SHA
> round which is fundamentally noexcept, this isn't a trivial overhead
> especially when it's completely unnecessary.

Ok. Hands down: What's the associated overhead you are talking about? Do you
have exact numbers?

>
> > > This is why Chris has proposed async_result from ASIO instead, that
> > > lets the caller of an async API supply the synchronisation method to
> > > be used for that particular call. async_result is superior to futures
> > > in all but one extremely important way: async_result cannot traverse
> > > an ABI boundary, while futures can.
> >
> > What's the difference between async_result and a future? I am unable to
> > find that in the ASIO documentation.
>
> As Bjorn mentioned, an async_result is a per-API policy for how to
> indicate the completion of an asynchronous operation. It could be as
> simple as an atomic boolean.

The problem with async_result (as mentioned in a different post) is that it
merely takes care of "transporting" from the ASIO future island to another
one. It can be just as well be adapted to any other future based system.

>
> > > Replacing the entire concurrency engine and indeed paradigm in your
> > > C++ runtime is, I suspect, too scary for most, even if the code
> > > changes are straightforward. It'll be the "bigness" of the concept
> > > which scares them off.
> >
> > Neither me or Hartmut are proposing to use HPX within boost. However, we
> > want to release a HPX-enhanced C++ stdlib in the near future to account
> > for this exact deficiency.
>
> With respect, nobody wants nor needs yet another STL. We already have
> three, and that already has enough of a maintenance headache.
>
> If you can persuade one of the big three to fully adopt your
> enhancements then I am all ears.
>
> > > To that end, the non-allocating basic_future toolkit I proposed on
> > > this list before Christmas I think has the best chance of "fixing"
> > > futures. Each programmer can roll their own future type, with
> > > optional amounts of interoperability and composure with other future
> > > islands. Then a future type lightweight enough for a SHA round is
> > > possible, as is some big thick future type providing STL future
> > > semantics or composure with many other custom future types. One also
> > > gains most of the (static) benefits of ASIO's async_result, but one
> > > still has ABI stability.
> >
> > I missed that. Can you link the source/documentation/proposal once more
> > please?
>
> Try http://comments.gmane.org/gmane.comp.lib.boost.devel/255022. The
> key insight of that proposal is the notion of static composition of
> continuations as the core design. One then composes, at compile-time,
> a sequence of continuations which implement any combination and
> variety of future you like, including the STL ones and the proposed
> Concurrency TS ones. You will note how the functional static
> continuations are effectively monadic, and therefore these elementary
> future promises are actually a library based awaitable resumable
> monadic toolkit which could be used to write coroutine based Hana or
> Expected monadic sequences which can be arbitrarily paused, resumed,
> or transported across threads.

This looks indeed promising. I think we should further investigate how this
could be used when dealing with truly asynchronous and concurrently executed
tasks.

>
> Universal composure of any kind of future with any other kind is
> possible when they share the same underlying kernel wait object. I
> intend to use my proposed pthreads permit object which is a portable
> userspace pthreads event object as that universal kernel wait object.
> If widely adopted, it may persuade the AWG to admit permit objects
> into POSIX threads for standardisation, that way C and C++ code can
> all use interoperable wait composure.
>
> Indeed, if POSIX threads already had the permit object, then OpenCL
> would have used it instead of making their custom event object, and
> we could then easily construct a std::future and boost::future for
> Compute. Sadly, the AWG don't see this sort of consequence, or rather
> I suspect they don't hugely care.

You make the assumption that OpenCL merely exist on the host. They could just
as well be containing device side specific information which is then be used
directly on the device (no POSIX there). BTW, this is just one example where
your assumption about kernel level synchronization is wrong. Another scenario
is in coroutine like systems like HPX where you have different synchronization
primitives (Boost.Fiber would be another example for that). And this is
exactly where the challenge is: Trying to find a way to unify those different
synchronization mechanisms. That way, we could have a unified future interface.
The things you proposed so far can be a step in that direction but certainly
don't include all necessary requirements.

>
> Niall


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk