Subject: Re: [boost] [gsoc] boost.simd news from the front.
From: David A. Greene (greened_at_[hidden])
Date: 2011-06-11 13:16:24
Joel falcou <joel.falcou_at_[hidden]> writes:
> On 11/06/11 11:17, David A. Greene wrote:
>> Mathias Gaunard<mathias.gaunard_at_[hidden]> writes:
>>> Making data parallelism simpler is the goal of NT2. And we do that by
>>> removing loops and pointers entirely.
>> First off, I want to apologize for sparking some emotions. That was not
>> my intent. I am deeply sorry for not expressing myself well.
> We all fall for blatant miscommunication there I guess ;)
>> NT2 sounds very interesting! Does it generate the loops given calls
>> into generic code?
> Basically yes, you express container based, semantic driven code using
> a matlab like syntax (+ more in case where matlab dont provide
> anything suitable) and the various evaluation point generates loop
> nests with properties derived from information carried by the
> container type and its settings (storage order, sharing data status,
This is super-cool! Anything to help the programmer restructure code
(or generate the loops correctly in the first place) is a huge win.
> The evaluation is then done by forwarding the expression to a
> hierarchical layer of architecture dependant meta-programms that, at
> each steps, strip the expression of its important high level semantic
> inforamtions and help generate the proper piece of code.
You machine intrinsics here, yes? This is where I think many times
the compiler might do better. If the compiler is good. :)
It's a little odd that "important" information would be stripped.
I know this is not a discussion of NT2 but for the curious, can
you explain this? Thanks!
> I assume the rest of the discussion is done for a programm written
> with the correct algorithm in term of compelxity, right ?
By correct algorithm, you mean an algorithm structured to expose data
parallelism? If so, yes, I think that's right.
>> - Programmer tries to run the compiler on it, examines code
>> - Code sometimes (maybe most of the time) executes poorly
>> - If not, done
>> - Programmer restructures loop nest to expose parallelism
>> - Try compiler directives first, if available (tell compiler which
>> loops to interchange, where to cache block, blocking factors,
>> which loops to collapse, etc.)
>> - Otherwise, hand-restructure (ouch!)
> If compilers allow for such informations to be carried yes.
Right. Many don't and in those cases, boost.simd is a great
>> - Programmer tries compiler again on restructured loop nest
>> - Code may execute poorly
>> - If not, done
>> - Programmer adds directives to tell the compiler which loops
>> to vectorize, which to leave scalar, etc.
>> - Code may still execute poorly
>> - If not, done
> Again, provided such a compiler is available on said platform
>> - Programmer uses boost.simd to write vector code at a higher level
>> than provided compiler intrinsics
> Yes and using a proper range based interface instead of a mess of for loops.
>> Does that seem like a reasonable use case?
> Yes. What we missed to clarify is that for a large share of people,
> available compilers on their systems fails to provide way to do step
> #2 and #3. And for these people, what they see is a world in which
> they are on their own dealing with this.
Oh absolutely. But I think that such people should be aware that code
generated by boost.simd may not be the best for their hardware
implementation IF they have access to a good compiler later. In those
cases, though, I suppose replacing, say, pack<float> with float
everywhere should get most of the original scalar code back. There may
yet be a little more cleanup to do but isn't that the case with _every_
HPC code? :) :) :)