Boost logo

Boost :

Subject: Re: [boost] [asio] Bug: Handlers execute on the wrong strand (Gavin Lambert).
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2013-10-31 12:34:56

On 31 Oct 2013 at 15:23, Gavin Lambert wrote:

> surprising though as the times I was seeing were in the order of 300ms
> from requesting the lock to being granted it, as I said before, which is
> a bit excessive for even a kernel wait. (And before you ask, the

I've seen CAS locks spike to a quarter second if you get a very
unlucky sequence of events where all cores are read modify writing
more cache lines that the cache coherency bus can cope with. You'll
see the mouse pointer, disc i/o etc all go to ~4Hz. Admittedly,
that's a problem older processors experience more than newer ones,
Intel have improved things.

> ASIO itself or in the small amount of wrapper code I had to rewrite when
> moving from ASIO to my custom implementation, because it seems to have
> gone away since switching over. (The access pattern of the outside code
> is unchanged.)

ASIO may be doing nothing wrong, but simply the combination of your
code with its code produces weird timing resonances which just happen
to cause spikes on some particular hardware. I occasionally get bug
reports for nedmalloc by hedge funds where they upgraded to some new
hardware and nedmalloc suddenly starts latency spiking. I tell them
to add an empty for loop incrementing an atomic, and they're often
quite surprised when the spiking goes away.

> > Mmm, I was just about to suggest that nedmalloc might be doing a free
> > space consolidation run and that might be the cause of the spike, but
> > if it isn't then okay.
> Not unless it can do that without locking anything, at least. I was
> basically only recording attempts to lock/unlock rather than any access
> to the allocator.

nedmalloc keeps multiple pools, and while free space consolidating
one pool it will send traffic to one of the other pools.

> I suspect I'm hitting the memory allocator in my implementation more
> frequently than ASIO was, actually -- I'm not trying to cache and reuse
> operations or buffers; it just does a "new" whenever it needs it.
> (Although I might be getting away with fewer intermediate objects, since
> I've cut the functionality to the bare minimum.) So I doubt allocation
> was the issue. (Unless maybe it was trying to *avoid* allocation that
> introduced the issue, as the post that started this discussion implied.)

One of the cunning ideas I had while at BlackBerry was for a new
clang optimiser pass plugin which has the compiler coalesce operator
new calls into batch mallocs and replace all sequences of stack
unwound new/deletes with alloca(). It would break ABI compatibility
with GCC, but I reckoned would deliver tremendous performance
improvements in malloc contended code. Shame we probably won't see
that optimisation any time soon, it would help Boost code in


Currently unemployed and looking for work.
Work Portfolio:

Boost list run by bdawes at, gregod at, cpdaniel at, john at