Boost logo

Boost :

Subject: Re: [boost] [XInt] CoW/Move Timings
From: Jeffrey Lee Hellrung, Jr. (jhellrung_at_[hidden])
Date: 2011-03-10 10:37:40


On 3/10/2011 6:48 AM, Chad Nelson wrote:
> On Wed, 09 Mar 2011 23:30:21 -0800
> "Jeffrey Lee Hellrung, Jr."<jhellrung_at_[hidden]> wrote:
>
>> On 3/9/2011 9:35 PM, Chad Nelson wrote:
>>> On Wed, 09 Mar 2011 17:36:40 -0800
>>> Steven Watanabe<watanabesj_at_[hidden]> wrote:
>>>
>>>>> The "review code" lines are for the unmodified review version of
>>>>> the code. All others are for the pass-by-reference version, with
>>>>> several other modifications that should have no measurable effect
>>>>> on the speed.
>>>>
>>>> Is this version of the code available somewhere?
>>>
>>> <http://www.oakcircle.com/download/xint-postreview1.zip>
>>
>> It doesn't look like any of the cpp files in libs/xint/test
>> correspond to the benchmark you posted most recently. Am I blind, or
>> should I be looking elsewhere?
>
> I'm not sure what you mean. I simply defined PERFORMANCE_TEST
> in /libs/xint/test/test_main.cpp and used the Linux "time" command to
> get the timings for each run.

Okay; I expected the timing to be encapsulated in the test, and more
details somewhere on how to test each combination of parameters
(CoW/non-CoW; move/non-move; and integer sizes). As it is, I think I
figured it out.

[...]
>> (in which case, there are still reference counts floating around,
>> sadly; but it looks like every attachment is immediately followed by
>> a deep copy).
>
> That was the only way I could see to provably disable it for timing
> purposes, without rewriting large portions of the code.

With the current framework, you're probably right. But then I think any
performance measurements you post to this list should be appropriately
framed in that context. Your statement

"The tests show pretty conclusively that Copy-on-Write, alone or in
conjunction with Boost.Move, provides a noticeable speed improvement
on all three sizes."

is simply disingenuous. Not just for the above reason, but for the
unnecessary deep-copy behavior I mention below.

>> However, it seems some operations still assume CoW behavior of
>> raw_integer_t.
>
> As I pointed out in my original timings message: "Note that the
> Boost.Move numbers could be a little better. The integer_t classes are
> move-enabled, but the internal data class is not as yet. The data class
> only sometimes uses return values, so the difference isn't likely to be
> great."

This doesn't preclude you from avoiding unnecessary copying...even with
CoW, creating a reference-to-const is cheaper than creating a copy.

>> Did you undef BOOST_XINT_USE_COW in your benchmarking of sans-CoW?
>
> Yes. That was the point of putting BOOST_XINT_USE_COW in there. It
> wasn't there in the review version of the code.

Okay, just trying to see which preprocessor definitions are relevant here.

[...]
>> What I'm really finding troubling, though, is the deep copy performed
>> at line 2119 in integer.hpp when using thread-safe integer_t's:
>>
>> integer_t<BOOST_XINT_APARAMS> r(n1._data() + n2._data());
>>
>> n1._data() + n2._data() executes the BOOST_XINT_RAWINT overload of
>> operator+, and this object is *not* moved into r, but deep copied
>> (!), since the temporary raw_integer_t object makes the reference
>> count 2 once the r object attaches to it. If my analysis is wrong on
>> this, please correct me, but I'm guessing this isn't what was
>> intended, right?
>
> That's not what was intended for the original code. The
> deep-copy-everything behavior is, as mentioned above, a hack that I
> added in the last few days to provably and completely disable
> copy-on-write.

Correct me if I'm wrong, but this is the behavior of the thread-safe
version of integer_t regardless of BOOST_XINT_USE_COW. So am I to infer
from your comment that this line did *not* deep copy in the review
version of the library?

[...]
>> That one deep copy, observed in a very short period of time stepping
>> through the code, doesn't give me a lot of confidence in your non-CoW
>> timings, as it's probably not unlikely there are multiple instances
>> of this same problem. I hope this convinces you of the importance of
>> tracking the number of allocations.
>>
>> Further, your non-CoW timings are somewhat poisoned anyway by the
>> persistent presence of reference counting, which is just dead weight
>> adding to the timings. I think until we get a stripped down integer
>> class that just consists of a std::vector magnitude and a bool sign,
>> it will be hard to trust any comparison between CoW and non-CoW. And
>> it will be more convenient to code this up if the arithmetic
>> algorithms can automatically work with a std::vector of digits ;)
>
> Then there's no way to prove whether CoW is useful or not until I
> rewrite the entire code to remove it anyway.

Well...yes, that's likely the case. I believe that adding CoW would
give zero improvement, and add at least negligible overhead, over simply
using move emulation for this particular test case, where the only
relevant operations are "T r = x + y", "T r = x * y", and "T r =
mulmod(x, y)" (assuming that CoW idioms are avoided in the respective
implementations). If you really wanted to keep CoW around in the core
implementation, you would be well served to come up with a test case
where you *expect* CoW to perform better (of course, something other
than creating spurious copies just for the hell of it).

- Jeff


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk