Boost Users :
From: Brian Budge (brian.budge_at_[hidden])
Date: 2006-05-08 15:13:20
Thanks for the idea Greg. I thought for sure you were on to
something, but I tried adding the _restrict__ keyword in the
operator.hpp binary_ops functions, and it made no difference :(
On 5/8/06, Greg Link <link_at_[hidden]> wrote:
> Well, have you considered certainty of memory aliasing? In
> particular, gcc supports the restrict keyword, (e.g. double
> *__restrict__ c ) indicating that the memory spaces pointed to by c
> will never be accessed by anything /but/ c, allowing it to make load-
> store and register usage optimizations it couldn't otherwise. In
> particular, it's 100% certain in the manually indexed case that a
> will never ever refer to b. Then again, it can't be as sure in the
> looped version.
> Just a thought, that may or may not pan out. All it takes to try is
> a quick addition of __restrict__ however, so it's not a tough test.
> - Greg Link
> Penn State University
> York College of Pennsylvania
> On May 8, 2006, at 2:23 PM, Brian Budge wrote:
> > Thanks for the ideas guys.
> > Compile options are like so:
> > g++ -O3 -msse -mfpmath=sse
> > I tried the metaprogramming technique (which is pretty nifty :) ), and
> > got interesting results.
> > Basically, it made my += operator run twice as SLOW, while making my +
> > operator run twice as FAST.
> > I have a feeling that this is all due to the different optimizations
> > that gcc is doing at multiple stages of compilation. For example, it
> > may be doing autovectorization of the simple loop case of +=, which it
> > can't figure out with the metaprogramming technique. I'm still
> > stumped as to why I'm roughly an order of magnitude slower with + than
> > with +=.
> > Any more insights?
> > Thanks again for the ideas so far!
> > Brian
> > On 5/8/06, John Maddock <john_at_[hidden]> wrote:
> >>> Any ideas how to increase the performance of the new code here? A
> >>> factor of 10 makes it seem like I am just missing something
> >>> important.
> >> I would suspect it's the loop that's at fault, although very I'm
> >> surprised
> >> it's a factor of 10. Your original code had the loop unrolled, so
> >> you might
> >> try a bit of template metaprogramming to achieve the same effect
> >> here.
> >> Otherwise you're going to have to do a bit of debugging and/or
> >> inspection of
> >> the assembly generated.
> >> BTW the measurements you made were in release mode right? If inline
> >> expansions are turned off (debug mode for example) the operators-
> >> based
> >> version may well pass through many more function calls. Of course
> >> these all
> >> disappear as long as your compiler does a reasonable job of inlining.
> >> HTH, John.
> >> _______________________________________________
> >> Boost-users mailing list
> >> Boost-users_at_[hidden]
> >> http://lists.boost.org/mailman/listinfo.cgi/boost-users
> > _______________________________________________
> > Boost-users mailing list
> > Boost-users_at_[hidden]
> > http://lists.boost.org/mailman/listinfo.cgi/boost-users
> Boost-users mailing list
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net