Subject: Re: [boost] Non-allocating future promise... Re: ASIO into the standard (was: Re: C++ committee meeting report)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2014-07-09 07:05:55
On 8 Jul 2014 at 23:12, Gottlob Frege wrote:
> >> 2013. The talk is called "Non-Allocating std::future/promise". I
> >> think most of it is about a... non-allocating future-promise.
> >> Hopefully. That was at least the idea.
> > Dear dear dear ... given that I was working at BlackBerry with you at
> > the time, and went with you to C++ Now, I really don't know how I
> > missed this.
> > I'm going to assume that I didn't miss this and instead choose to
> > forget and then pretend that your idea was my idea. So, my apologies,
> > and I'll credit you in the docs when the time comes.
> I _thought_ you were in the audience, but that could have been one of
> my other talks.
I was at *one* of your talks. I also almost certainly reviewed your
slides as I do for most C++ Now talks I don't make it to. I have no
excuse really, just failing memory (though in fairness, it's been an
awfully full two years for me, two transatlantic relocations, first
baby etc, I can see some memories have got deleted to make space)
> At work, I hardly talked about it, so if you weren't at the talk, you
> could have easily missed it. Chandler said that Google also ended up with
> similar code, so we are all thinking along the same lines. Chandler had
> some good ideas for handling the exceptions as well (ie if thrown when
> setting the value). It is hard to be 100% standards compliant (since the
> standard basically assumes every implementation uses an allocated storage
> location, and those assumptions leak into the interface).
Boost.Thread's promise-future doesn't implement allocator support, so
a de-malloced implementation shouldn't lose us too much (I agree
we'll have to slightly deviate from the standard in some APIs, but
TBH it's the standard that needs fixing here, promise-future
I need a malloc-free promise-future for AFIO. I see an exact latency
resonance peak at one thread sleep duration, and upon investigation
it's because the futures are sleeping the thread due to malloc being
latency lumpy. AFIO also currently does eight malloc/frees per op
executed with four inside a global lock, and I'd very much like to
see that down to four malloc/frees per op with none inside a global
Also the batch hash engine's tasks are too finely grained to use
mallocing promise-future. The promise-future adds about 15-20% to
each hash round. That needs to become < 5%.
> >> Yes, replacing the spin and pointer updating with TM would be nice.
> > And here is where things become very interesting.
> <...interesting TM stuff...>
> Yes, keep us informed. I've been assuming TM won't work well for "big"
> transactions, but I have no idea yet what is big and what is small.
The upper limit is probably 100 cache lines touched. My current best
guess is the small limit is somewhere around 10 cache lines touched,
so you need to exceed 10 lines and keep under 50. A narrow window.
> Of course, we could also just ask the TM guys, like Michael Wong et al.
> But nothing beats experiencing it for yourself.
He'll say go use transactional GCC, and he's right. I put in a code
path to use __transaction_relaxed as that's the malloc capable one
(malloc doesn't abort transactions in transactional GCC, unlike in
TSX). Performance was dismal, especially so on non-TSX hardware where
it was another order of magnitude slower again. My lesson learned
from that is when writing code targeting both TSX and transactional
GCC, don't bother with __transaction_relaxed, just use
__transaction_atomic and follow the same granularity rules as with
Regarding transactional GCC, it is neat the way you can write
metaprogramming which generates code which the compiler optimiser
spots can elide all locking completely, then your output runs
completely in parallel. That is very hard to do normally in
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/