|
Boost : |
Subject: Re: [boost] [asio] Bug: Handlers execute on the wrong strand (Gavin Lambert).
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2013-10-31 12:34:56
On 31 Oct 2013 at 15:23, Gavin Lambert wrote:
> surprising though as the times I was seeing were in the order of 300ms
> from requesting the lock to being granted it, as I said before, which is
> a bit excessive for even a kernel wait. (And before you ask, the
I've seen CAS locks spike to a quarter second if you get a very
unlucky sequence of events where all cores are read modify writing
more cache lines that the cache coherency bus can cope with. You'll
see the mouse pointer, disc i/o etc all go to ~4Hz. Admittedly,
that's a problem older processors experience more than newer ones,
Intel have improved things.
> ASIO itself or in the small amount of wrapper code I had to rewrite when
> moving from ASIO to my custom implementation, because it seems to have
> gone away since switching over. (The access pattern of the outside code
> is unchanged.)
ASIO may be doing nothing wrong, but simply the combination of your
code with its code produces weird timing resonances which just happen
to cause spikes on some particular hardware. I occasionally get bug
reports for nedmalloc by hedge funds where they upgraded to some new
hardware and nedmalloc suddenly starts latency spiking. I tell them
to add an empty for loop incrementing an atomic, and they're often
quite surprised when the spiking goes away.
> > Mmm, I was just about to suggest that nedmalloc might be doing a free
> > space consolidation run and that might be the cause of the spike, but
> > if it isn't then okay.
>
> Not unless it can do that without locking anything, at least. I was
> basically only recording attempts to lock/unlock rather than any access
> to the allocator.
nedmalloc keeps multiple pools, and while free space consolidating
one pool it will send traffic to one of the other pools.
> I suspect I'm hitting the memory allocator in my implementation more
> frequently than ASIO was, actually -- I'm not trying to cache and reuse
> operations or buffers; it just does a "new" whenever it needs it.
> (Although I might be getting away with fewer intermediate objects, since
> I've cut the functionality to the bare minimum.) So I doubt allocation
> was the issue. (Unless maybe it was trying to *avoid* allocation that
> introduced the issue, as the post that started this discussion implied.)
One of the cunning ideas I had while at BlackBerry was for a new
clang optimiser pass plugin which has the compiler coalesce operator
new calls into batch mallocs and replace all sequences of stack
unwound new/deletes with alloca(). It would break ABI compatibility
with GCC, but I reckoned would deliver tremendous performance
improvements in malloc contended code. Shame we probably won't see
that optimisation any time soon, it would help Boost code in
particular.
Niall
-- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk