|
Proto : |
Subject: Re: [proto] proto performance
From: Nate Knight (Nate.Knight_at_[hidden])
Date: 2011-02-24 13:17:07
On Feb 20, 2011, at 4:43 AM, Joel Falcou wrote:
> On 20/02/11 12:41, Eric Niebler wrote:
>> On 2/20/2011 6:40 PM, Joel Falcou wrote:
>>> On 20/02/11 12:31, Karsten Ahnert wrote:
>>>> It is amazing that the proto expression is faster then the naive one.
>>>> The compiler must really love the way proto evaluates an expression.
>>> I still dont really know why. Usual speed-up in our use cases here is
>>> like ranging from 10 to 50%.
>> That's weird.
>>
> Well, for me it's weird in the good way so I dont complain. Old version
> of nt2 had cases where
> we were thrice as fast as same vector+iterator based code ...
> _______________________________________________
> proto mailing list
> proto_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/proto
To explore the issue further I modified the original posted test code (see http://pastebin.com/1Vr9BkPP).
The modifications include a transform based evaluator, a lambda expression based example, and
some attributes to keep the evaluation functions from being inlined.
First, the numbers (average after 5 iterations of the main loop). All compilation done with -O3 against Boost 1.45.
MacBook Pro, 10.6.6, Core 2 Duo
ProtoContext ProtoTransform ProtoLambda Loop
GCC 4.2.1 (Apple) : 5.3565438 5.3721942 126.38458 1.3657978
GCC 4.4.5 : 1.8878364 1.8845548 70.056237 0.942303
GCC 4.5.2 : 1.8840608 1.889619 1.2806688 1.0589558
GCC 4.6.0 (2/5/11): 1.8854768 1.8834438 1.278347 1.2345208
CLANG 2.9 (125472): 5.455976 5.4627628 3.825104 1.2330524
Now, removing the ((noinline)), gives (in the same order)
GCC 4.2.1 (Apple) : 4.1448478 5.3795842 126.53211 1.3215378
GCC 4.4.5 : 1.2505956 1.2500816 69.409665 0.7198288
GCC 4.5.2 : 0.596143 0.7213138 0.71969283 0.7211534
GCC 4.6.0 (2/5/11): 1.2942638 1.4324828 0.646147 0.6632324
CLANG 2.9 (125472): 1.2975226 1.2966478 1.3849834 1.2452362
I'm not sure how meaningful this second set of numbers is. If the evaluation functions are inlined, the compiler
can realize that evaluating them num_of_steps times is unnecessary since the data isn't changing between
iterations. It then (I believe) optimizes out certain parts of the loop in certain cases.
A lot of the additional code came from Eric's cpp-next articles.
Nate
Proto list run by eric at boostpro.com