|
Boost : |
Subject: Re: [boost] [gsoc] boost.simd news from the front.
From: David A. Greene (greened_at_[hidden])
Date: 2011-06-11 11:42:19
Mathias Gaunard <mathias.gaunard_at_[hidden]> writes:
> On 11/06/2011 02:08, David A. Greene wrote:
>
>> What's the difference between:
>>
>> ADDPD XMM0, XMM1
>>
>> and
>>
>> XMM0 = __builtin_ia32_addpd (XMM0, XMM1)
>>
>> I would contend nothing, from a programming effort perpective.
>
> Register allocation.
But that's not where the difficult work is.
>>> Currently we support all SSEx familly, all AMD specific stuff and
>>> Altivec for PPC and Cell adn we have a protocol to extend that.
>>
>> How many different implementations of DGEMM do you have for x86? I have
>> seen libraries with 10-20.
>
> That's because they don't have generic programming, which would allow
> them to generate all variants with a single generic core and some
> meta-programming.
No. No, no, no. These implementations are vastly different. It's not
simply a matter of changing vector lenght.
> We work with the LAPACK people, and some of them have realized that
> the things we do with metaprogramming could be very interesting to
> them, but we haven't had any research opportunity to start a project
> on this yet.
I'm not saying boost.simd is never useful. I'm saying the claims made
about it seem overblown.
>> - Write it using the operator overloads provided by boost.simd. Note
>> that the programmer will have to take into account various
>> combinations of matrix size and alignment, target microarchitecture
>> and ISA and will probably have to code many different versions.
>
> Shouldn't you just need the cache line size? This is something we
> provide as well.
Nope. It's a LOT more complicated than that.
> Ideally you shouldn't need anything else that cannot be made
> architecture-agnostic.
What's the right vector length? That alone depends heavily on the
microarchitecture. And as I noted above, this is one of the simpler
questions.
> And as I said, you should make the properties on size (and even
> alignment if you really care) a template parameter, so as to be able
> to dispatch it to relevant bits at compile-time...
Yes, I can see how that would be useful. It will cover a lot of cases.
But not everthing. And that's ok, as long as the library documentation
spells that out.
> C++ metaprogramming *is* a autotuning framework.
To a degree. How do you do different loop restructurings using the
library?
>> Your rationale, as
>> I understand it, is to make exploiting data parallelism simpler.
>
> No it isn't.
> Its goal is to provide a SIMD abstraction layer. It's an
> infrastructure library to build other libraries. It is still fairly
> low-level.
Ok, that makes more sense.
>>>> Intel and PGI.
>>>
>>> Ok, what guys on non intel nor PGI supproted machine does ?
>>> Cry blood ?
>>
>> If boost.simd is targeted to users who have subpar compilers
>
> Other compilers than intel or PGI are subpar compilers? Maybe if you
> live in a very secluded world.
No, not every compiler is subpar. But many are.
>> But please don't go around telling people that compilers can't
>> vectorize and parallelize. That's simply not true.
>
> Run the trivial accumulate test?
Vectorized.
> The most little of things can prevents them from vectorizing. Sure, if
> you add a few restrict there, a few pragmas elsewhere, some specific
> compiling options tied to floating point, you might be able to get the
> system to kick in.
Yep. And that's a LOT easier the hand-restructuring loops and writing
vector code manually.
> But my personal belief is that automatic parallelization of arbitrary
> code is an approach doomed to failure.
Then HPC has been failing for 30 years.
> Programming is about making things explicit using the right language
> for the task.
Programming is about programmer productivity.
>> Boost.simd could be useful to vendors providing vectorized versions of
>> their libraries.
>
> Not all fast libraries need to be provided by hardware vendors.
No, not all. In most other cases, though, the compiler should do it.
>> I have seen too many cases where programmers wrote an "obviously better"
>> vector implementation of a loop, only to have someone else rewrite it in
>> scalar so the compiler could properly vectorize it.
>
> Maybe if the compiler was really that good, it could still do the
> optimization when vectors are involved?
No, because information has been lost at that point.
-Dave
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk