Boost logo

Boost :

Subject: Re: [boost] [gsoc] boost.simd news from the front.
From: David A. Greene (greened_at_[hidden])
Date: 2011-06-11 11:42:19

Mathias Gaunard <mathias.gaunard_at_[hidden]> writes:

> On 11/06/2011 02:08, David A. Greene wrote:
>> What's the difference between:
>> and
>> XMM0 = __builtin_ia32_addpd (XMM0, XMM1)
>> I would contend nothing, from a programming effort perpective.
> Register allocation.

But that's not where the difficult work is.

>>> Currently we support all SSEx familly, all AMD specific stuff and
>>> Altivec for PPC and Cell adn we have a protocol to extend that.
>> How many different implementations of DGEMM do you have for x86? I have
>> seen libraries with 10-20.
> That's because they don't have generic programming, which would allow
> them to generate all variants with a single generic core and some
> meta-programming.

No. No, no, no. These implementations are vastly different. It's not
simply a matter of changing vector lenght.

> We work with the LAPACK people, and some of them have realized that
> the things we do with metaprogramming could be very interesting to
> them, but we haven't had any research opportunity to start a project
> on this yet.

I'm not saying boost.simd is never useful. I'm saying the claims made
about it seem overblown.

>> - Write it using the operator overloads provided by boost.simd. Note
>> that the programmer will have to take into account various
>> combinations of matrix size and alignment, target microarchitecture
>> and ISA and will probably have to code many different versions.
> Shouldn't you just need the cache line size? This is something we
> provide as well.

Nope. It's a LOT more complicated than that.

> Ideally you shouldn't need anything else that cannot be made
> architecture-agnostic.

What's the right vector length? That alone depends heavily on the
microarchitecture. And as I noted above, this is one of the simpler

> And as I said, you should make the properties on size (and even
> alignment if you really care) a template parameter, so as to be able
> to dispatch it to relevant bits at compile-time...

Yes, I can see how that would be useful. It will cover a lot of cases.
But not everthing. And that's ok, as long as the library documentation
spells that out.

> C++ metaprogramming *is* a autotuning framework.

To a degree. How do you do different loop restructurings using the

>> Your rationale, as
>> I understand it, is to make exploiting data parallelism simpler.
> No it isn't.
> Its goal is to provide a SIMD abstraction layer. It's an
> infrastructure library to build other libraries. It is still fairly
> low-level.

Ok, that makes more sense.

>>>> Intel and PGI.
>>> Ok, what guys on non intel nor PGI supproted machine does ?
>>> Cry blood ?
>> If boost.simd is targeted to users who have subpar compilers
> Other compilers than intel or PGI are subpar compilers? Maybe if you
> live in a very secluded world.

No, not every compiler is subpar. But many are.

>> But please don't go around telling people that compilers can't
>> vectorize and parallelize. That's simply not true.
> Run the trivial accumulate test?


> The most little of things can prevents them from vectorizing. Sure, if
> you add a few restrict there, a few pragmas elsewhere, some specific
> compiling options tied to floating point, you might be able to get the
> system to kick in.

Yep. And that's a LOT easier the hand-restructuring loops and writing
vector code manually.

> But my personal belief is that automatic parallelization of arbitrary
> code is an approach doomed to failure.

Then HPC has been failing for 30 years.

> Programming is about making things explicit using the right language
> for the task.

Programming is about programmer productivity.

>> Boost.simd could be useful to vendors providing vectorized versions of
>> their libraries.
> Not all fast libraries need to be provided by hardware vendors.

No, not all. In most other cases, though, the compiler should do it.

>> I have seen too many cases where programmers wrote an "obviously better"
>> vector implementation of a loop, only to have someone else rewrite it in
>> scalar so the compiler could properly vectorize it.
> Maybe if the compiler was really that good, it could still do the
> optimization when vectors are involved?

No, because information has been lost at that point.


Boost list run by bdawes at, gregod at, cpdaniel at, john at