Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-02-10 09:32:55


Slawomir Lisznianski wrote:

[...]

>> During tests involving manipulations of shared_ptr variables similar to
>> those performed by algorithms on container elements – their copying and
>> removal – a significant overhead was measured. During tests, care was
>> taken to assure that timing did not include creation of new objects
>> managed by shared_ptrs nor their destruction.
>>
>> One way to customize the shared_ptr without breaking existing code would
>> be to introduce an optional template parameter stating locking policy.
>>
>> Attached you will find test code used on a win32 platform that produced
>> the following results on Intel Pentium 4 2.66Mhz single processor
>> machine. Each line represents separate testing iteration, so averages
>> can be calculated:

[...]

>> std::ostream& _M_out;
>> LARGE_INTEGER _M_startEvent;
>> LARGE_INTEGER _M_endEvent;
>> LARGE_INTEGER _M_frequency;

Please note that identifiers that start with _M (underscore followed by an
uppercase letter) are reserved by the implementation, as are identifiers
containing a douible underscore.

>> };
>>
>> void run()
>> {
>> boost::shared_ptr<int> ptrA__(new int(0)), ptrB__, ptrC__;
>> Timer timer__(std::cout);
>> for (int i=0; i<4000000; ++i)
>> {
>> ptrB__ = ptrA__;
>> ptrC__ = ptrB__;
>> }
>> }

Thank you for the test. I was able to confirm your results on an AMD Athlon
1.4. However, you have to agree that your test code isn't very realistic, or
to be precise, it's very unrealistic. ;-) I was able to cut both single- and
multithreaded times to 20ms by replacing shared_count::operator= as shown
below:

    shared_count & operator= (shared_count const & r) // nothrow
    {
        sp_counted_base * tmp = r.pi_;

        if(tmp != pi_)
        {
            if(tmp != 0) tmp->add_ref_copy();
            if(pi_ != 0) pi_->release();
            pi_ = tmp;
        }

        return *this;
    }

That's because you are measuring a tight cycle of no-ops. While it would be
trivial to modify the test to avoid this particular optimization, I'd
appreciate it if you can produce a test sample that is derived from a real
code base that uses shared_ptr extensively.

That said, your test, when rerun with the "next release shared_count" (proof
of concept available at

    http://www.pdimov.com/cpp/shared_count_x86_exp2.hpp

) produces

BOOST_HAS_THREADS is: TRUE
Elapsed time: 254150 microseconds
Elapsed time: 225076 microseconds
Elapsed time: 225000 microseconds
Elapsed time: 224875 microseconds
Elapsed time: 226152 microseconds
Elapsed time: 224947 microseconds
Elapsed time: 225070 microseconds
Elapsed time: 229008 microseconds
Elapsed time: 227057 microseconds
Elapsed time: 224856 microseconds
Press any key to continue

I find this (~3x instead of 10x) slightly less alarming. ;-)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk