Boost logo

Boost Users :

Subject: Re: [Boost-users] [iostreams] Devices and WOULD_BLOCK
From: Gavin Lambert (gavinl_at_[hidden])
Date: 2015-01-27 01:51:22


On 27/01/2015 15:08, Niall Douglas wrote:
> On 27 Jan 2015 at 10:58, Gavin Lambert wrote:
>> There are negative performance consequences to copying a shared_ptr (ie.
>> incrementing or decrementing its refcount). *Most* applications don't
>> need to care about this (it's very small) but sometimes it's worthy of
>> note, and there's no harm in avoiding copies in silly places (which is
>> why I thwack people that pass a shared_ptr as a value parameter).
>
> As food for thought, AFIO which uses shared_ptr very heavily indeed
> to avoid any locking at all passes them around all by value. It was
> bugging me whether this was costing me performance, so I tried
> replaced the lot with reference semantics.
>
> Total effect on performance: ~0.1%.

As I said, it's not a big difference (atomic ops are typically ~1us, and
that was on the previous CPU generation), but it's still one of my pet
peeves, as while there are many places where shared_ptrs do need to get
copied for correctness, parameter passing is not one of those places.
(And performance gets worse if you end up passing the object through
many layers as part of keeping methods short or similar "tidiness" or
abstraction guidelines; and it wastes more stack too.)

You're going to have to make lots of copies anyway in an asynchronous
library like AFIO, because binding an asynchronous callback is one of
those places that you *do* need to copy a shared_ptr, so if you have a
high percentage of async code (which is what I would expect with that
sort of library) then it's not going to make much difference either way.

> The key is that AFIO very, very rarely has more than one thread touch
> a shared_ptr at once. That, on Intel at least, makes their atomic
> reference counting almost as cheap as non-atomic reference counting.
> Combine that with the compiler judiciously folding out copies for you
> where it can, and the overhead for the benefits to debugging and
> maintenance is irrelevant.

Writing a single shared_ptr instance from multiple threads requires even
more overhead from the extra spinlock (via the atomic_*(&sp...) family
of functions). Though an uncontended spinlock basically only costs 2
atomic-ops, so it's usually not too bad.

(But those functions do mildly irritate me in that they're also passing
by value, but at least in that case they're inlined template methods so
the compiler will almost certainly elide the parameter copy. Another
case where generic library code may "win" over application code.)

Multi-writers is one case where it may be better to create separate
per-thread copies from some "safe" context up front, if you can
(assuming you're ok with operating on stale data until some sync point).
  But again, to a certain extent async code patterns may already be
doing these copies "for you". And if you're limiting yourself to WORM
access only, you can skip the spinlock if you're careful.

> Of course, I'm currently seeing a 300k CPU cycle per op average.
> shared_ptr is tiny compared to that. With a 10k CPU cycle per op
> average I might care a bit more.

I'm probably biased the other way, because about half of the code I work
on has sub-millisecond budgets. :)


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net