Boost logo

Boost :

Subject: Re: [boost] [any] new version
From: Ilya Bobir (ilya.bobir_at_[hidden])
Date: 2011-09-02 20:16:33


On Fri, Sep 2, 2011 at 12:44 PM, Nevin Liber <nevin_at_[hidden]> wrote:
> On 2 September 2011 11:01, Ilya Bobir <ilya.bobir_at_[hidden]> wrote:
>>> unsigned int next_id()
>>> {
>>>    static unsigned int previous_id = 0;   //0 is not assigned to a type
>>>
>>>    ++previous_id;
>>>    return previous_id;
>>> }
>>>
>>> [...]
>>
>> Would not this be non-thread safe?
>
> That's the second problem that has to be tackled with this code.
>
> Would this work:
>
> unsigned next_id()
> {
>    static std::atomic<unsigned> previous_id;
>    return ++previous_id;
> }

It is not exactly the second problem.
The only reason for the new library is speed, but if one uses atomics
or full-flagged locking it may become slower than the current
boost::any. If this is the case it does not make any sense to look at
the new library at all.

What follows is a benchmarking of Boost.Any and the new any with some
tweaks. I have included the actual output so that someone interested
my double check my logic but I also summarized the numbers after every
run, so if you read just the text, skipping the benchmarks output, you
should get the picture anyway. And there is a summary in the last two
paragraphs if you really want to look just at the end result.

OK, so I started this because I was wondering how is it possible that
we can skip a virtual function call when we need to figure out our
real type id. Maybe I missed an explanation somewhere earlier in the
thread but there is a tradeoff going on. We increase the size of the
any instances and store the type id directly instead of relying on a
virtual function call to figure it out.
And this tradeoff is IIUC unrelated to the way we actually tag types.

I have run the benchmark on my machine and figured out that the 50
000% gain on MSVC is a reasult of an optimization. After "fixing" the
benchmark a little (attached as any_becnhmark.cpp) MS C++ gives me
numbers of the same order of magnitude as GCC. I was compiling
against Boost 1.44. Here is the output on my box:

$ g++ --version
g++ (GCC) 4.3.4 20090804 (release) 1

Ilya_at_Ilya-PC ~/works/tests/any
$ g++ -O3 -I /d/works/boost/ any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./a.exe
Testing any with int
  copying:
    old any: 249
    new any: 250
  moving:
    old any: 281
    new any: 452
  any_cast:
    old any: 187
    new any: 63
Testing any with double
  copying:
    old any: 234
    new any: 249
  moving:
    old any: 312
    new any: 437
  any_cast:
    old any: 187
    new any: 47
Testing any with std::string
  copying:
    old any: 265
    new any: 265
  moving:
    old any: 344
    new any: 483
  any_cast:
    old any: 203
    new any: 47
sizeof(old any): 4
sizeof(new any): 8

Moving is actually ~30% slower in the new version, but any_cast is ~4
times faster. And the instance size is doubled.

Ilya_at_Ilya-PC ~/works/tests/any
$ cl
Microsoft (R) C/C++ Optimizing Compiler Version 16.00.30319.01 for x64
Copyright (C) Microsoft Corporation. All rights reserved.

This is a MS Visual Studio 2010 compiler.

Ilya_at_Ilya-PC ~/works/tests/any
$ cl -EHs -O2 -I 'D:\works\boost' any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./any_benchmark.exe
Testing any with int
  copying:
    old any: 61
    new any: 61
  moving:
    old any: 75
    new any: 102
  any_cast:
    old any: 477
    new any: 59
Testing any with double
  copying:
    old any: 61
    new any: 61
  moving:
    old any: 75
    new any: 104
  any_cast:
    old any: 476
    new any: 59
Testing any with std::string
  copying:
    old any: 78
    new any: 79
  moving:
    old any: 88
    new any: 124
  any_cast:
    old any: 480
    new any: 58
sizeof(old any): 8
sizeof(new any): 16

Note that this is a 64 bit compiler. Move is ~30% slower, any_cast is
~8 times faster. Instance size is doubled.

Now about the generations of ids for types. If RTTI is available
typeid(T).name() will be different for all types and at the same time
it returns a pointer that would have the same value for all Ts across
all compilation units (not considering dynamic libraries) and will be
thread safe. So, I replaced the unsigned integers with const char
pointers (any.typeid.name.hpp). Here are the numbers:

Ilya_at_Ilya-PC ~/works/tests/any
$ g++ -O3 -I /d/works/boost/ any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./a.exe
Testing any with int
  copying:
    old any: 234
    new any: 249
  moving:
    old any: 297
    new any: 452
  any_cast:
    old any: 187
    new any: 63
Testing any with double
  copying:
    old any: 249
    new any: 250
  moving:
    old any: 296
    new any: 453
  any_cast:
    old any: 187
    new any: 62
Testing any with std::string
  copying:
    old any: 250
    new any: 249
  moving:
    old any: 344
    new any: 468
  any_cast:
    old any: 187
    new any: 62
sizeof(old any): 4
sizeof(new any): 8

For the new version of any_cast test I was getting "62" or "47" as the
average time for both unsinged interger and char pointer versions
depending on the run, so for GCC this change did not actually affect
the run time. At the same time this version is thread safe, but
requires typeid to be accessible.

Ilya_at_Ilya-PC ~/works/tests/any
$ cl -EHs -O2 -I 'D:\works\boost' any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./any_benchmark.exe
Testing any with int
  copying:
    old any: 61
    new any: 67
  moving:
    old any: 79
    new any: 115
  any_cast:
    old any: 488
    new any: 95
Testing any with double
  copying:
    old any: 65
    new any: 65
  moving:
    old any: 78
    new any: 115
  any_cast:
    old any: 491
    new any: 94
Testing any with std::string
  copying:
    old any: 79
    new any: 80
  moving:
    old any: 90
    new any: 132
  any_cast:
    old any: 486
    new any: 93
sizeof(old any): 8
sizeof(new any): 16

64 bit cl on the other hand is persistently ~30% slower for the char
pointer case. I guess it is again some kind of optimization. But I
did not look at the generated code. The new version any_cast is still
~5 times faster.

Then I though, well, why not use the type_info objects themselves?
They are guaranteed to exist through the application lifetime. id is
now "const std::type_info *". Here are the numbers (any.typeid.hpp):

Ilya_at_Ilya-PC ~/works/tests/any
$ g++ -O3 -I /d/works/boost/ any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./a.exe
Testing any with int
  copying:
    old any: 249
    new any: 250
  moving:
    old any: 281
    new any: 436
  any_cast:
    old any: 188
    new any: 62
Testing any with double
  copying:
    old any: 234
    new any: 250
  moving:
    old any: 296
    new any: 421
  any_cast:
    old any: 187
    new any: 63
Testing any with std::string
  copying:
    old any: 249
    new any: 266
  moving:
    old any: 343
    new any: 468
  any_cast:
    old any: 187
    new any: 47
sizeof(old any): 4
sizeof(new any): 8

No change from the char pointer case for GCC.

Ilya_at_Ilya-PC ~/works/tests/any
$ cl -EHs -O2 -I 'D:\works\boost' any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./any_benchmark.exe
Testing any with int
  copying:
    old any: 60
    new any: 61
  moving:
    old any: 74
    new any: 103
  any_cast:
    old any: 464
    new any: 59
Testing any with double
  copying:
    old any: 60
    new any: 60
  moving:
    old any: 74
    new any: 102
  any_cast:
    old any: 467
    new any: 59
Testing any with std::string
  copying:
    old any: 77
    new any: 79
  moving:
    old any: 88
    new any: 124
  any_cast:
    old any: 461
    new any: 58
sizeof(old any): 8
sizeof(new any): 16

MS C++ on the other hand was able to perform better than for the char
pointer. Essentially the picture is the same as for the unsigned int
case.

OK, there is a trade-off that can be done to make any_cast 4 to 8
times faster by making move ~30% slower and increasing the size of any
instances from one to two pointers.
Let's try doing that with the Boost.Any source (boost.any.doubleSize.hpp):

Ilya_at_Ilya-PC ~/works/tests/any
$ g++ -O3 -I /d/works/boost/ any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./a.exe
Testing any with int
  copying:
    old any: 234
    new any: 249
  moving:
    old any: 297
    new any: 421
  any_cast:
    old any: 187
    new any: 62
Testing any with double
  copying:
    old any: 234
    new any: 250
  moving:
    old any: 296
    new any: 437
  any_cast:
    old any: 172
    new any: 62
Testing any with std::string
  copying:
    old any: 250
    new any: 281
  moving:
    old any: 327
    new any: 468
  any_cast:
    old any: 187
    new any: 47
sizeof(old any): 8
sizeof(new any): 8

No change?! I guess, GCC can optimize to a level when it does not
care for a change like this.

Ilya_at_Ilya-PC ~/works/tests/any
$ cl -EHs -O2 -I 'D:\works\boost' any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./any_benchmark.exe
Testing any with int
  copying:
    old any: 61
    new any: 64
  moving:
    old any: 78
    new any: 110
  any_cast:
    old any: 125
    new any: 57
Testing any with double
  copying:
    old any: 63
    new any: 63
  moving:
    old any: 77
    new any: 107
  any_cast:
    old any: 124
    new any: 57
Testing any with std::string
  copying:
    old any: 80
    new any: 81
  moving:
    old any: 91
    new any: 128
  any_cast:
    old any: 127
    new any: 57
sizeof(old any): 16
sizeof(new any): 16

MS C++ on the other hand does care. Note that the change only affects
the any_cast speed - almost 4 times faster, nor copy, nor move are
affected.

So, it seems that for GCC (at least 4.3.4 on Cyginw at O3) the
difference has nothing to do with the way we store and retrieve the
type information. Lets compare just Boost.Any with and without the
patch side by side (boost.any_benchmark.cpp):

Ilya_at_Ilya-PC ~/works/tests/any
$ g++ -O3 -I /d/works/boost/ boost.any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./a.exe
Testing any with int
  copying:
    old any: 249
    new any: 234
  moving:
    old any: 297
    new any: 296
  any_cast:
    old any: 187
    new any: 172
Testing any with double
  copying:
    old any: 249
    new any: 234
  moving:
    old any: 297
    new any: 296
  any_cast:
    old any: 187
    new any: 188
Testing any with std::string
  copying:
    old any: 265
    new any: 249
  moving:
    old any: 328
    new any: 343
  any_cast:
    old any: 187
    new any: 172
sizeof(old any): 4
sizeof(new any): 8

Same times for GCC. Only the instance sizes are different.

Ilya_at_Ilya-PC ~/works/tests/any
$ cl -EHs -O2 -I 'D:\works\boost' boost.any_benchmark.cpp

Ilya_at_Ilya-PC ~/works/tests/any
$ ./boost.any_benchmark.exe
Testing any with int
  copying:
    old any: 60
    new any: 61
  moving:
    old any: 75
    new any: 74
  any_cast:
    old any: 459
    new any: 125
Testing any with double
  copying:
    old any: 60
    new any: 62
  moving:
    old any: 73
    new any: 73
  any_cast:
    old any: 465
    new any: 122
Testing any with std::string
  copying:
    old any: 78
    new any: 78
  moving:
    old any: 87
    new any: 88
  any_cast:
    old any: 462
    new any: 124
sizeof(old any): 8
sizeof(new any): 16

MS C++ on the other hand likes the change a lot. An almost 4 times
speed increase for any_cast with no speed changes for other
operations.

I guess that now it is time to look at the generated code to figure
out why GCC does not care for this change or what else is different
between Boost.Any and the version presented by Martin, but I will stop
here for now.

The bottom line: there is a change that will make any_cast almost 4
times faster for MS C++, does not change speed on GCC and increases
size of all any instances from one pointer to two pointers. The
change is thread safe.

I think that in order to compare apples to apples one need to change
the "unsigned integers as type ids" case to be thread safe and only
then benchmark. A non-thread safe any is probably of a limited use,
isn't it?

Ilya Bobyr









Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk