Boost logo

Boost :

Subject: Re: [boost] [GGL] [geometry] Inexplicable speed benefit when using Visual C++ 2010
From: Stephan T. Lavavej (stl_at_[hidden])
Date: 2010-04-20 21:23:51


Actually, I said that there were no massive performance improvements between VC9 and VC10, except for the addition of rvalue references. Keeping everything else constant, there shouldn't be order-of-magnitude performance improvements between VC9 and VC10 for purely C code. Of course, we improve our compiler back-end's code generation in every major release, but by a few percent (if we're very lucky). (VC does not yet implement anything like autovectorization.)

Also, as I explained, malloc()/new are unchanged between VC9 and VC10 - they both call HeapAlloc(). Now, if you're not keeping everything else constant - e.g. switching from x86 to x64, or from XP to Vista+ - then the Low Fragmentation Heap may be responsible.

If it isn't rvalue references and it isn't the LFH, then I'm truly stumped.

Additionally, I find it interesting that you say that performance also massively increased between Intel v10 and Intel v11. That's when they implemented rvalue references too. Are you sure that you're not wrapping GPC (or anything else) in C++ code that would automatically benefit from rvalue references?

Thanks,
STL

-----Original Message-----
From: boost-bounces_at_[hidden] [mailto:boost-bounces_at_[hidden]] On Behalf Of Arash Partow
Sent: Tuesday, April 20, 2010 5:50 PM
To: boost_at_[hidden]
Subject: Re: [boost] [GGL] [geometry] Inexplicable speed benefit when using Visual C++ 2010

On 16/04/2010 11:30 PM, Christian Buchner wrote:
> Hi everybody,
>
> my first attempt to post to this list bounced, so I am trying again.
>
> My employer is an early adopter of the Boost Generic Geometry Library
> [GGL] in an engineering application related to mobile radio
> communicatons. We use it to estimate and optimize the coverage of 4G
> radio networks. Our code uses a lot of multi-polygon unions to
> estimate the amount of ground covered (and not covered) by radio beams
> and iteratively improves the antenna parameters.
>
> We've been compiling and shipping our application with Visual C++ 2008 so far.
>
> We found that GCC 4.4 on Linux was about 100% faster than Visual C++
> 2008 on Linux without modifying the code. This bothered us quite a bit
> as both compilers were allowed to use full optimization. We found that
> by optimizing (globaly overloading) the new and delete operators to
> re-use allocated memory fragments on Windows we were able to get
> nearly 50% speed benefit, so we attributed much of the performance
> difference to sub-optimal memory heap management of Visual C++ 2008.
>
> Then we tried recompiling the project with Visual C++ 2010 Ultimate
> Release Candidate (RC). The speed gain of the algorithm was 900% (not
> joking) and the results still appear to be correct. Now this is
> surreal and no one here in the office has found a reasonable
> explanation yet without going into the metaphysical domain.
>
> Would anyone with knowledge of compiler and runtime internals be able
> to make an educated guess as to how such a speed gain of factor 10 is
> possible? Is anyone else seeing similar speedups in boost or in the
> geometry library when compiling with Visual C++ 2010 RC (HINT: it's a
> free download, so anyone can try it out until end of June 2010).
>

Nothing new here, with msvc10 GPC, which is entirely C (no templates), sees a roughly 6x-7x on fastcode and favour-speed settings over msvc9, with PGO it gets to about 8x-9x and on intel v11 with PGO et al you're looking at around 11x-12x increase over intel v10 or msvc9. The point here is that the increase is mainly centered around new memory allocation mechanisms (as described by Stephan) in the msvc10 backend - not necessarily anything special MS has done wrt c++ specifically. (btw the polygons used range from simple 4-5 corner convex to 100k+ corners concave-disjoint with holes and concentric islands, all operations union, diff, xor). On a side note with intel v11 if the loop unrolling is set correctly for the target processor and if sse4.1 is available, it peaks at around 13x-14x - and this is with a code base that was last touched nearly 8 years ago.

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk