Subject: Re: [boost] [asio] Bug: Handlers execute on the wrong strand (Gavin Lambert).
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2013-10-25 18:09:27
On 25 Oct 2013 at 19:39, Gavin Lambert wrote:
> >> upside most of the guts are entirely lock-free (though not wait-free,
> >> since it's based on Boost.LockFree's queue).
lockfree::queue isn't actually hugely performant. Most lock free code
isn't compared to most lock based implementations because you gain in
worst case execution times by sacrificing average case execution
times. The only major exception is lockfree::spsc_queue which is
indeed very fast by any metric.
> > Asio at Windows uses IOCP (default settings, using asio::io_service for task
> > scheduling) and that is the (theoretical) reason of better thread scheduling
> > for the Asio-based thread pool. Sometimes it's really visible.
> It's also full of mutexes though, which is why it didn't work out for
> me. (Note that I was using Boost 1.53 when testing Asio; maybe this has
> changed in future versions, although I heard that 1.54 picked up a bug
> in the IOCP reactor.)
I saw lost wakeups during parallel writes in ASIO 1.54, so I disabled
those for AFIO. That appears to be fixed in 1.55, so AFIO now
parallelises everything as it was designed to. This might mean ASIO
in 1.55 is fixed.
> I'm not sure exactly which lock triggered the slow path (my logging was
> only sufficient to show that it was one of the ones inside Asio, but not
> which one). But as the prior email said, given reuse of strand
> implementations between supposedly independent strands, that seems like
> a likely candidate. (Though it didn't take long for the latency spikes
> to manifest -- typically they'd start after a couple of minutes and then
> recur roughly every 10-30 seconds.)
ASIO is, once you compile it with optimisation, really a thin wrapper
doing a lot of mallocs and frees around Win IO completion ports. Any
latency spikes are surely due to either IOCP or the memory allocator
causing a critical section to exceed its spin count, and therefore go
to kernel sleep?
> I haven't done a head-to-head benchmark on each (and it wouldn't
> surprise me if Asio were faster than mine for many loads -- and it's
> definitely more flexible than I made mine) but so far my one is doing at
> least as well as Asio on production loads but without the latency spikes
> from the locks. Still very early days yet though.
If you're on Haswell, you might look into my memory transaction
implementation in AFIO. It uses Intel TSX if available according to
runtime detection, otherwise it falls back onto a policy composed
spin lock (yes I know I did NIH with yet another Boost spinlock
implementation, but hey mine is policy composed so you can vary spin
counts etc!!!). It works on Intel's TSX simulator, but I would really
love to know if it works on real TSX hardware.
-- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk