|
Boost : |
Subject: Re: [boost] [XInt] Some after thoughts about SIMD
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2011-03-11 15:04:15
Joel Falcou wrote:
> On 11/03/11 20:29, Simonson, Lucanus J wrote:
>> I am basing my opinion on experience with larrabee vector
>> instruction set not SSE.
>>
>> http://drdobbs.com/architecture-and-design/216402188?pgno=2
>>
>> "Vector Arithmetic, Logical, and Shift Instructions
>> The arithmetic, logical, and shift vector instructions include
>> everything you'd expect: add, subtract, add with carry, subtract
>> with borrow, multiply, round, clamp, max, min, absolute max,
>> logical-or, logical-and, logical-xor, logical-shift and
>> arithmetic-shift by a per-element variable number of bits, and
>> conversions among floats, doubles, and signed and unsigned int32s."
>>
>> I'll call out add with carry and subtract with borrow from that
>> list, which is not exhaustive, by the way. There are a ton of
>> instructions and some are impossible to map an expression to using
>> expression templates due to language limitations.
>>
>> I said vector instruction sets are a moving target at the outset.
>> If you are designing based on SSE you are solving yesterday's
>> problems.
> Sorry to not have a Larabee to do stuff , really.
>
> More seriously, I fail to see an vector operation that is "limited by
> the language". I have very eager to see the one you speak about.
If you have the same variable in more than one place in an expression you can map that expression to different/fewer instructions than if they were all different variables in different places. There is no way to perform that mapping using expression templates because it cannot be detected in template metaprogramming which varaibles are the same, only that they have the same type. Therefor you cannot map expression to the optimal instructions using template metaprogramming.
In general three register instructions that modify one register and perhaps use it as an input also will map to expression where the variable is on both sides of the assignment operator. Think fused multiply add.
At first we thought we could do anything with expression templates. It is not true. We can only perform logic on types at compile time. You can't implement a compiler for runtime variables as a template metaprogram, only a compiler for types. I already gave the example of common sub expression elimination being unimplementable in expression templates. Rather than a contrived example it is actually the most general and easily understood example of the problem. It is a pretty important optimization in and of itself, but it is also widely understood, at least by readers of this list.
Regards,
Luke
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk