Boost logo

Boost :

Subject: Re: [boost] [thread] Alternate future implementation and future islands.
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2015-03-19 14:05:45


On 19 Mar 2015 11:09, "Niall Douglas" <s_sourceforge_at_[hidden]> wrote:
>
> On 17 Mar 2015 at 21:42, Giovanni Piero Deretta wrote:.
>
> > * The wait strategy is not part of the future implementation (although
> > a default is provided). future::get, future::wait, wait_all and
> > wait_any are parametrized by the wait strategy.
> >
> > * The design aims to streamline and make as fast as possible future
> > and promise at the cost of a slower shared_future (although there is
> > room for improvement).
>
> Your future still allocates memory, and is therefore costing about
> 1000 CPU cycles.

1000 clock cycles seems excessive with a good malloc implementation.

Anyways, the plan is to add support to a custom allocator. I do not think
you can realistically have a non allocating future *in the general case* (
you might optimise some cases of course).

> That is not "as fast as possible". The use of
> std::atomic also prevents the compiler optimiser from eliding the
> future implementation entirely, because atomic always forces code to
> be generated. As much as futures today are big heavy things,
> tomorrow's C++17 futures especially resumable function integrated
> ones must be completely elidable, then the compiler can completely
> eliminate and/or collapse a resumable function call or sequence of
> such calls where appropriate. If you have an atomic in there, it has
> no choice but to do the atomic calls needlessly.

I understand what you are aiming at, but I think that the elidability is
orthogonal. Right now I'm focusing on making the actual synchronisation
fast and composable in the scenario where the program has committed to make
a computation async.

>
> I think memory allocation is unavoidable for shared_future, or at
> least any realistically close to the STL implementation of one. But a
> future I think can be both STL compliant and never allocate memory
> and be optimisable out of existence.
>
> > * The wait strategy only deals with 'event' objects, which act as a
> > bridge between the future and the promise.
> >
> > The event object is really my the core of my contribution; it can be
> > thought of as the essence of future<void>::then; alternatively it can
> > be seen as a pure user space synchronization primitive.
>
> Exactly as my C11 permit object is. Except mine allows C code and C++
> code to interoperate and compose waits together.

Not at all. I admit not having studied permit in detail (the doc size is
pretty daunting) but as far as I can tell the waiting thread will block in
the kernel.

It provides a variety of ways on how to block, the user can't add more.

With event the decision of how to block ( or even whether to block at all)
is done by the consumer and most importantly it can be delayed till the
last moment.

You can think of event as the bridge between the signal side and wait side
of permit, or alternatively as a future<void> which only supports then (no
get or wait).

Regarding interfacing with a C event source , it can be done trivially
right now now as long as it provides a callback (function pointer +
context pointer) API, which is very common already. No need to wait for
existing libraries to embrace a new synchronisation object (cue xkcd
vignette about standards :) )

Allowing C to wait for events would require replacing the waiter vtable
with the c equivalent, but is not something I care much right now.

>
> > Other features of this implementation:
> >
> > * Other than in the blocking strategy itself, the future and promise
> > implementation have no sources of blocking (no mutexes, not even spin
> > locks).
> >
> > * The shared state is not reference counted.
> >
> > To demonstrate the generality of the waiting concept, I have
> > implemented a few waiter objects.
> >
> > * cv_waiter: this is simply a waiter on top of an std::mutex + cond
var.
> >
> > * futex_waiter (linux): an atomic counter + a futex, possibly more
> > efficient than the cv_waiter
> >
> > * sem_waiter (posix): a waiter implemented on top of a posix
> > semaphore. More portable than the futex waiter and possibly more
> > efficient than the cv_waiter
> >
> > * fd_waiter(linux, possibly posix): a waiter implemented on top of
> > linux eventfd (for portability it can also be implemented on top of a
> > pipe); you can use select with futures!
> >
> > * task_waiter: a completely userspace based coroutine waiter which
> > switches to another coroutine on wait and resumes the original
> > coroutine on signal.
> >
> > * scheduler_waiter: another coroutine based waiter, but on top of an
> > userspace task scheduler. On wait switch to the next ready task, on
> > signal enqueue the waiting task to the back of the ready queue.
> >
> > I know that there is a plan to reimplement a new version of
> > boost::threads, hopefully this implementation can contribute a few
> > ideas.
>
> My current best understanding of Vicente's plans is that each thread
> has a thread local condvar. The sole cause of a thread sleeping,
> apart from i/o, is on that thread local condvar. One therefore has a
> runtime which keeps a registry of all thread local condvars, and can
> therefore deduce the correct thread local condvars to wake when
> implementing a unified wait system also compatible with
> Fibers/resumable functions.

That doesn't work if a program wants to block for example in select or
spin on a memory location, or on an hardware register, or wait for a
signal, or interoperate with a different userspace thread library, or some
other event queue (asio, qt or whatever) etc. and still also wait for a
future. Well, you can use future::then, but it has overhead.

-- gpd


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk