Boost logo

Boost :

Subject: Re: [boost] Futures (was: Re: [compute] Some remarks)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-01-06 04:13:57


On 5 Jan 2015 at 12:49, Thomas Heller wrote:

> > I don't think it's that easy because really it comes down to
> > commonality of kernel wait object, or rather, whether one has access
> > to the true underlying kernel wait object or not.
>
> You make the assumption that you only ever synchronize on kernel space
> objects. This is not at all required nor necessary.

I make the assumption that one _eventually_ synchronises on kernel
wait objects, and I also assume that you usually need the ability to
fall back onto a kernel wait in most potential wait scenarios (e.g.
if no coroutine work is pending, and there is nothing better to do
but sleep now). One could I suppose simply call yield() all the time,
but that is battery murder for portable devices.

What is missing on POSIX is a portable universal kernel wait object
used by everything in the system. It is correct to claim you can
easily roll your own with a condition variable and an atomic, the
problem comes in when one library (e.g. OpenCL) has one kernel wait
object and another library has a slightly different one, and the two
cannot be readily composed into a single wait_for_all() or
wait_for_any() which accepts all wait object types, including
non-kernel wait object types.

Windows does have such a universal kernel wait object (the event
object). And on POSIX you could inefficiently emulate a universal
kernel wait object using a pipe at the cost of two file descriptors
per object, though directly using a futex on Linux would be cheaper.

On 5 Jan 2015 at 13:49, Thomas Heller wrote:

> > No it isn't. Current futures require the compiler to generate the
> > code for handling exception throws irrespective of whether it
could
> > ever happen or not. As a relative weight to something like a SHA
> > round which is fundamentally noexcept, this isn't a trivial
overhead
> > especially when it's completely unnecessary.
>
> Ok. Hands down: What's the associated overhead you are talking
about? Do you
> have exact numbers?

I gave you exact numbers: a 13% overhead for a SHA256 round.

> The problem with async_result (as mentioned in a different post) is that
> it merely takes care of "transporting" from the ASIO future island to
> another one. It can be just as well be adapted to any other future based
> system.

Absolutely. Which is precisely why it's a very viable alternative to
fiddling with futures. Most programmers couldn't give a toss about
whether futures do this or that, they do care when they have to jump
through hoops because library A is in a different future island to
library B.

Chris' async_result approach makes that go away right now, not in
2019 or later. It's a very valid riposte to the Concurrency TS, and
unlike the Concurrency TS his approach is portable and is already
standard practice instead of invention of standards by mostly
Microsoft.

> > Try http://comments.gmane.org/gmane.comp.lib.boost.devel/255022.
The
> > key insight of that proposal is the notion of static composition
of
> > continuations as the core design. One then composes, at
compile-time,
> > a sequence of continuations which implement any combination and
> > variety of future you like, including the STL ones and the
proposed
> > Concurrency TS ones. You will note how the functional static
> > continuations are effectively monadic, and therefore these
elementary
> > future promises are actually a library based awaitable resumable
> > monadic toolkit which could be used to write coroutine based Hana
or
> > Expected monadic sequences which can be arbitrarily paused,
resumed,
> > or transported across threads.
>
> This looks indeed promising. I think we should further investigate how
> this could be used when dealing with truly asynchronous and concurrently
> executed tasks.

For me it's a question of free time. This is stuff I do for only a
few hours per week because this time is unfunded (happy to discount
my hourly rate for anyone wanting to speed these up!), and right now
my priority queue is:

1. Release BindLib based AFIO to stable branch (ETA: end of January).
2. Get BindLib up to Boost quality, and submit for Boost review (ETA:
March/April).
3. C++ Now 2015 presentation (May).
4a. Non-allocating lightweight future promises extending Expected
(from June onwards).
4b. Google Summer of Code mentoring of concurrent_unordered_map so it
can be finished and submitted into Boost.

That's the best I can do given this is unfunded time.

> > Universal composure of any kind of future with any other kind is
> > possible when they share the same underlying kernel wait object.
I
> > intend to use my proposed pthreads permit object which is a
portable
> > userspace pthreads event object as that universal kernel wait
object.
> > If widely adopted, it may persuade the AWG to admit permit
objects
> > into POSIX threads for standardisation, that way C and C++ code
can
> > all use interoperable wait composure.
> >
> > Indeed, if POSIX threads already had the permit object, then
OpenCL
> > would have used it instead of making their custom event object,
and
> > we could then easily construct a std::future and boost::future
for
> > Compute. Sadly, the AWG don't see this sort of consequence, or
rather
> > I suspect they don't hugely care.
>
> You make the assumption that OpenCL merely exist on the host.

No, it's more I'm limiting the discussion to host-only and indeed
kernel threading only. I might add that I took care in my pthreads
permit object design that it works as expected without a kernel being
present so it can be used during machine bootstrap, indeed you can
create a pthreads permit object which only spins and yields. That
object design is entirely capable of working correctly under
coroutines too, or on a GPU. It's a C API abstraction of some ability
for one strand to signal another strand, how that is actually
implemented underneath is a separate matter.

> They could
> just as well be containing device side specific information which is
> then be used directly on the device (no POSIX there). BTW, this is just
> one example where your assumption about kernel level synchronization is
> wrong. Another scenario is in coroutine like systems like HPX where you
> have different synchronization primitives (Boost.Fiber would be another
> example for that). And this is exactly where the challenge is: Trying to
> find a way to unify those different synchronization mechanisms. That
> way, we could have a unified future interface. The things you proposed
> so far can be a step in that direction but certainly don't include all
> necessary requirements.

Actually this is the exact basis for my argument regarding many
future types, and creating a library which is a factory for future
types. In C++ in a proper design we only pay for what we use, so a
future suitable for a SHA round needs to be exceptionally
lightweight, and probably can't copy-compose at all but can
move-compose (this is where a newly created future can atomically
destroy its immediately preceding future, and therefore a
wait_for_all() on an array of such lightweight futures works as
expected). Meanwhile a future which can be used across processes
concurrently would be necessarily a far heavier and larger object.

The same applies to coroutine parallelism, or HPX, or WinRT. They all
get families of future types best suited for the task at hand, and if
the programmer needs bridges across future islands then they pay for
such a facility. The cost is, as Harmut says, a multiplication of
future islands, but I believe that is inevitable anyway, so one might
as well do it right from the beginning.

I might add that BindLib lets the library end user choose what kind
of future the external API of the library uses. Indeed BindLib based
AFIO lets you choose between std::future and boost::future, and
moreover you can use both configurations of AFIO in the same
translation unit and it "just works". I could very easily - almost
trivially - add support for a hpx::future in there, though AFIO by
design needs kernel threads because it's the only way of generating
parallelism in non-microkernel operating system kernels (indeed, the
whole point of AFIO is to abstract that detail away for end users).

This is why I'd like to ship BindLib sooner rather than later. I
believe it could represent a potential enormous leap forward for the
quality and usability of C++ 11 requiring Boost libraries.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk