Boost logo

Boost :

From: Ian McCulloch (ianmcc_at_[hidden])
Date: 2005-11-25 21:35:12


Robert Ramey wrote:

> Ian McCulloch wrote:
>

[...]

>> Secondly, the buffer in the oprimitive class has much less
>> functionality
>> than the vector<char> buffer, as well as the buffer I used previously
>> (http://lists.boost.org/Archives/boost/2005/11/97156.php). In
>> particular,
>> it does not check for buffer overflow when writing. Thus it has no
>> capability for automatic resizing/flushing, and is only useful if you
>> know
>> in advance what the maximum size of the serialized data is. This
>> kind of buffer is of rather limited use, so I think that this is not
>> a fair comparison.
>
> I think its much closer to the binary archive implementation the the
> current binary_oarchive is.

I dont understand that sentence, sorry. Which binary archive
implementation?

> I also think its fairly close to what
> an archive class would look like for a message passing application.

Surely it depends on the usage pattern? If you are sending fixed size
messages, then sure a fixed size buffer with no overflow checks will be
fastest. If you are sending variable size messages with no particular
upper bound on the message size then it is a tradeoff whether you use a
resizeable buffer or count the number of items you need to serialize
beforehand. I wouldn't like to guess what is the more 'typical' use. Both
are important cases.

> The real difference here is that save_binary would be implemented
> in such a way that the overhead per call is pretty small. Maybe
> not quite as small as here, but much smaller than the overhead
> associated with ostream::write.

Ok, but even with the ideal fixed-size buffer, the difference between the
serialization lib and save_array for out-of-cache arrays of char, measured
by you, is:
> Time using serialization library: 1.922
> Time using direct call to save_array: 0.25
almost a factor 8. For a buffer that has more overhead, no matter how
small, it will directly translate to an increase in that factor.

Note however that in this case, save_array() is purely memory-bandwidth
limited. It would be interesting for you if you repeated the benchmark
with a much smaller array size. You should see several jumps in
performance corresponding to various caches, L1, L2, TLB, perhaps others.
In any particular benchmark, some of these thresholds might be hard to see.
You will need to put the serialization into a loop to get the CPU time to a
sensible number, and do a loop or two before starting the timer so that the
data is already in the cache. In the fixed-size buffer scenario this is
actually not too far from a realistic benchmark. I know (roughly) what the
result will be. If you still stand by your previous comment:
>>>In my view, it does support my contention that
>>>implementing save_array - regardless of how it is
>>>in fact implemented - represents a premature optimization.
>>>I suspect that the net benefit in the kind of scenario you
>>>envision using it will be very small.
then obviously you do not.

> So I believe that the above
> results give a much more accurate picture than the previous
> ones do of the effect of application of the proposed enhancement.

Fine. I am glad you finally agree with the 10x slowdown figure (well, if
you want to be picky 7.688x slowdown on your Windows XP box, 9.8512x on my
linux-opteron box).

[...]

>> Interestingly, on this platform/compiler combination, without the bug
>> fix in save_binary() it still takes 1.11 seconds ;) I would guess
>> your Windows compiler is doing some optimization that gcc is not, in
>> that case.
>
> Thanks for doing this - it is very helpful.
>
> Sure you're compiling at maximum optimization -O3 .

Of course. -O3 gives no difference from -O2, small difference from -O1,
huge difference from -O0. When there is a bug in the benchmark, any result
is possible ;) Quite possibly your compiler was simply noticing that the
same memory location was being overwritten repeatedly and chose to instead
store it in a register? Anyway, since you took no special effort to ensure
that the compiler didn't optimize away code it would have been quite
legitimate for your benchmark to report 0 time for all tests. In the
absence of such effort, you at least need to check carefully the assembly
output to make sure the benchmark is really testing what you think it is
testing.

> In anycase, this
> is not untypical of my personal experience with benchmarks. They vary
> a lot depending on extraneaus variables. Our results seem pretty
> comparable though.
>
> Robert Ramey

Cheers,
Ian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk