Boost logo

Boost :

Subject: Re: [boost] [thread] Alternate future implementation and future islands.
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2015-03-20 05:19:50


On 19 Mar 2015 19:51, "Niall Douglas" <s_sourceforge_at_[hidden]> wrote:
>
> On 19 Mar 2015 at 18:05, Giovanni Piero Deretta wrote:
>
> > > Your future still allocates memory, and is therefore costing about
> > > 1000 CPU cycles.
> >
> > 1000 clock cycles seems excessive with a good malloc implementation.
>
> Going to main memory due to a cache line miss costs 250 clock cycles,
> so no it isn't. Obviously slower processors spin less cycles for a
> cache line miss.

Why would a memory allocation necessarily imply a cache miss. Eh you are
even assuming an L3 miss, that must be a poor allocator!

>
> > Anyways, the plan is to add support to a custom allocator. I do not
think
> > you can realistically have a non allocating future *in the general
case* (
> > you might optimise some cases of course).
>
> We disagree. They are not just feasible, but straightforward, though
> if you try doing a composed wait on them then yes they will need to
> be converted to shared state. Tony van Eerd did a presentation a few
> C++ Now's ago on non-allocating futures. I did not steal his idea
> subconsciously one little bit! :)
>

I am aware of that solution My issue with that design is that it require an
expensive rmw for every move. Do a few moves and it will quickly dwarf the
cost of an allocation, especially considering that an OoO will happily
overlap computation with a cache miss, while the required membar will stall
the pipeline in current CPUs (I'm thinking of x86 of course). That might
change in the near future though.

> > I understand what you are aiming at, but I think that the elidability is
> > orthogonal. Right now I'm focusing on making the actual synchronisation
> > fast and composable in the scenario where the program has committed to
make
> > a computation async.
>
> This is fine until your compiler supports resumable functions.

This is funny :). A couple of months ago I was arguing with Gor Nishanov
(author of MS resumable functions paper), that heap allocating the
resumable function by default is unacceptable. And here I am arguing the
other side :).

OK my compromise is to not allocate while the async operation is merely
deferred but can still be executed synchronously. Lazily convert to heap
allocation only when the operation needs to be executed truly
asynchronously, basically until you actually create the promise (at that
point the cost of the async setup will provably dwarf the allocation; and
even in this case the allocation can be skipped if we know we will sync
before a move, then it us safe to allocate the shared state on the stack).
This should allow to compiler to remove the abstraction completely if it
can prove it safe. Still working on it, should have something in a few
days.

I guess I'm converging to your design.

>
> > > Exactly as my C11 permit object is. Except mine allows C code and C++
> > > code to interoperate and compose waits together.
> >
> > Not at all. I admit not having studied permit in detail (the doc size is
> > pretty daunting) but as far as I can tell the waiting thread will block
in
> > the kernel.
>
> It can spin or sleep or toggle a file descriptor or HANDLE.
>
> > It provides a variety of ways on how to block, the user can't add more.
>
> It provides a hook API with filter C functions which can, to a
> limited extent, provide some custom functionality. Indeed the file
> descriptor/HANDLE toggling is implemented that way. There is only so
> much genericity which can be done with C.

I believe my design is much simpler and flexible; then again is trying to
solve a different and narrower problem than full scale synchronization of
arbitrary threads.

-- gpd


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk