|
Boost Users : |
Subject: Re: [Boost-users] [iostreams] Devices and WOULD_BLOCK
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-01-30 06:04:30
On 27 Jan 2015 at 19:51, Gavin Lambert wrote:
> As I said, it's not a big difference (atomic ops are typically ~1us, and
> that was on the previous CPU generation), but it's still one of my pet
> peeves, as while there are many places where shared_ptrs do need to get
> copied for correctness, parameter passing is not one of those places.
> (And performance gets worse if you end up passing the object through
> many layers as part of keeping methods short or similar "tidiness" or
> abstraction guidelines; and it wastes more stack too.)
A good compiler optimiser will collapse those if the code is header
only. In AFIO's case it does exactly that - five nested function
calls each taking by value shared_ptr turn into a single shared_ptr
copy only.
> You're going to have to make lots of copies anyway in an asynchronous
> library like AFIO, because binding an asynchronous callback is one of
> those places that you *do* need to copy a shared_ptr, so if you have a
> high percentage of async code (which is what I would expect with that
> sort of library) then it's not going to make much difference either way.
You're right in general, and in AFIO until v1.3 of the engine. In
v1.4 I'm going even more intrusive, and I expect to elide all but
necessary copying completely in the main engine loop. I will do this
via the batch detachable and reattachable node_ptr support in my
concurrent_unordered_map, basically you can detach and recycle op
state rather than ever allocating or deallocating. This should let me
stop pinning shared_ptrs to their callbacks as the new custom future
implementation will tag a shared_ptr exactly once for you.
> > Of course, I'm currently seeing a 300k CPU cycle per op average.
> > shared_ptr is tiny compared to that. With a 10k CPU cycle per op
> > average I might care a bit more.
>
> I'm probably biased the other way, because about half of the code I work
> on has sub-millisecond budgets. :)
300k CPU cycles is still 0.1ms. But no, it's the stochastic variance
that upsets me. I want a worst case latency of 0.1ms, then I would be
pleased I think.
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net