Subject: Re: [proto] Holding children by copy or reference
From: Eric Niebler (eric_at_[hidden])
Date: 2013-09-30 02:01:20
On 9/30/2013 1:54 PM, Mathias Gaunard wrote:
> A while ago, I recommended to set up domains so that Proto contains its
> children by value, except for terminals that should either be references
> or values depending on the lvalue-ness. This allows to avoid dangling
> reference problems when storing expressions or using 'auto'.
> I also said there was no overhead to doing this in the case of Boost.SIMD.
> After having done more analyses with more complex code, it appears that
> there is indeed an overhead to doing this: it confuses the alias
> analysis of the compiler which becomes unable to perform some
> optimizations that it would otherwise normally perform.
> For example, an expression like this:
> r = a*b + a*b;
> will not anymore get optimized to
> tmp = a*b;
> r = tmp + tmp;
> If terminals are held by reference, the compiler can also emit extra
> loads, which it doesn't do if the the terminal is held by value or if
> all children are held by reference.
> This is a bit surprising that this affects compiler optimizations like
> this, but this is replicable on both Clang and GCC, with all versions I
> have access to.
It's very surprising. I suppose it's because the compiler can't assume
equasional reasoning holds for some user-defined type. That's too bad.
> Therefore, to avoid performance issues, I'm considering moving to always
> using references (with the default domain behaviour), and relying on
> BOOST_FORCEINLINE to make it work as expected.
Why is FORCEINLINE needed?
> Of course this has the caveat that if the force inline is disabled (or
> doesn't work), then you'll get segmentation faults.
I don't understand why that should make a difference. Can you clarify? A
million thanks for doing the analysis and reporting the results, by the way.
As an aside, in Proto v5, terminals and intermediate nodes are captured
as you describe by default, which means perf problems. I still think
this is the right default for C++11, and for most EDSLs. I'll have to be
explicit in the docs about the performance implications, and make it
easy for people to get the by-ref capture behavior when they're ok with
-- Eric Niebler BoostPro Computing http://www.boostpro.com
Proto list run by eric at boostpro.com