Boost logo

Boost :

Subject: Re: [boost] Futures (was: Re: [compute] Some remarks)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-01-13 09:25:29


On 12 Jan 2015 at 21:50, Giovanni Piero Deretta wrote:

> I have been following the thread with interest, and I wanted to know
> more about your non-allocating future/promise pair. As far as I
> understand, your future and promise have a pointer to each other and
> they update the other side every time they are moved, right?

Exactly right.

> My question is, as you need to do the remote update with an atomic
> operation (exchange in the best case), and you usually perform at
> least a few moves (when composing futures for example), wouldn't a
> fast allocator outperform this solution?

Firstly, I found a separate CAS lock each per future and promise is
considerably faster than trying to be any more clever. When updating,
you lock both objects with back off before the update.

Secondly, no this approach is far faster than a fast allocator, at
least on Intel. The reason why is because promises and futures are
very, very rarely contended on the same cache line between threads,
so the CAS locking and updating almost never spins or contends. It's
pretty much full speed ahead.

The problem with specialised allocators is that firstly
Boost.Thread's futures don't support allocators with futures, and
secondly even if they did as soon as you bring global memory effects
into the picture, you constrain the compiler optimiser considerably.
For example, make_ready_future() with the test code I wrote is
implemented very naively as:

promise<T> p;
future<T> f(p.get_future());
p.set_value(v);
return f;
...
make_ready_future(5).get();

... which the compiler collapses into

movl 5, eax
ret

Any use of an allocator can't let the compiler do that for you
because touching global memory means the compiler has to assume an
unknown read. This doesn't mean a custom make_ready_future() couldn't
produce an equally optimised outcome, but for me personally the
ability of the compiler to collapse opcode output suggests a good
design here.

I would also assume that when allowed to collapse opcodes, the
compiler can also do alias folding etc which the use of an allocator
may inhibit.

> >> A portable, universal kernel wait object is
> >> not really necessary for that.
> >
> > I think a portable, universal C API kernel wait object is very
> > necessary if C++ is to style itself as a first tier systems
> > programming language.
> >
>
> For what is worth I'm working on a proof-of-concept future/promise
> pair that is wait strategy agnostic. The only function that needs to
> know about the wait strategy are the
> future::wait{,_for,_untill,_any,_all} family and of course
> future::get, in case it needs to call wait. In fact the wait functions
> are parametrized on the wait strategy (be it a futex, condition
> variable, posix fd, posix semaphore, coroutine yield, etc) and the
> wait object can be stack allocated.
>
> If I get everything right, all other functions, in particular
> promise::set_value and future::then should be lock-free (or wait
> free, depending on the underlying hardware).
>
> The shared state should also have a nice minimal API.
>
> The idea is fairly obvious in retrospect, I hope to be able to share
> some code soon.

I look forward to seeing some test code!

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk