|
Boost : |
Subject: Re: [boost] [XInt] Some after thoughts about SIMD
From: Joel Falcou (joel.falcou_at_[hidden])
Date: 2011-03-11 15:11:16
On 11/03/11 21:04, Simonson, Lucanus J wrote:
> If you have the same variable in more than one place in an expression you can map that expression to different/fewer instructions than if they were all different variables in different places. There is no way to perform that mapping using expression templates because it cannot be detected in template metaprogramming which varaibles are the same, only that they have the same type. Therefor you cannot map expression to the optimal instructions using template metaprogramming.
>
> In general three register instructions that modify one register and perhaps use it as an input also will map to expression where the variable is on both sides of the assignment operator. Think fused multiply add.
>
> At first we thought we could do anything with expression templates. It is not true. We can only perform logic on types at compile time. You can't implement a compiler for runtime variables as a template metaprogram, only a compiler for types. I already gave the example of common sub expression elimination being unimplementable in expression templates. Rather than a contrived example it is actually the most general and easily understood example of the problem. It is a pretty important optimization in and of itself, but it is also widely understood, at least by readers of this list.
Well, I dunno which compiler you use, but usually, when you do stuff
properly, the ET are usually optimized by the compiler itself afterward
and it usually do this for you.
We made the test in nt2 with a shifted load based FIR using a very
simple TMP logic, and well ... it never used more register than what we
though tit should.
As for madd, well ... we do pattern matching on the AST structure,
reemit a madd node and let the thingy do its job afterward.
I fail to see where the fail is, but it is important to have the ET emit
rather trivial code and let the compiler polish the code generation
afterward.
Under GCC, the CSE stuff is done byt he compiler as if it was normal code.
We noticed loop unrolling done through matrix ET and other such effect.
Which compiler did you used ? How are your ET base code done ?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk