Boost logo

Boost :

Subject: Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2009-01-19 01:50:02

On Sun, Jan 18, 2009 at 9:50 PM, Joel Falcou <joel.falcou_at_[hidden]> wrote:
> Dean Michael Berris a écrit :
>> Please don't misunderstand that I'm disagreeing with you here because
>> I do agree that there is a need to address the parallelism problem
>> when implementing considerably demanding solutions at a level where
>> you don't really have to worry about the architecture below on which
>> the code you're running on is like. However given the reality of the
>> situation with a myriad of available platforms on which to compile/run
>> C++, the pressure from both sides of the equation (library writers and
>> tool developers on one side, hardware vendors on the other) to come up
>> with a solution is immense -- especially since the industry has got to
>> adapt to it sooner than later. ;-)
> We agree then.
>> ... I think there is a market for precisely this kind of thing/work
>> now -- helping domain experts be able to recognize and utilize the
>> inherent parallelism in their solutions and the tools they are using.
>> :-)
> Best is to have them benefit of parallelism without them knowing about it
> really.

I'm a little weary about hiding these important issues to the people
who understand the higher scheme of things. Which is why I personally
don't think leaving the (non-C++ programming) domain experts in the
dark about the inherent parallelism in their solution is a good idea.

The reason why I think this is because the people who are going to be
solving the real world problem should be aware that the computing
facilities they have are actually capable of parallel computing, and
that the way they write their solutions (be it in any programming
language) will have a direct impact in the performance and scalability
of their solution. It really doesn't matter if they're writing
something that should work on a GPU/CPU+SIMD but that the way they
write their solution should work in parallel. Once they are aware of
the available parallelism, they should be able to adapt the way they
think, and the way they come up with the solutions.

As far as hiding the parallelism from them, the compiler is the
perfect place to do that especially if your aim is to just leverage
platform specific parallelism features of the machine. Even these
domain experts once they know about the compiler capabilities may be
able to write their code in such a way that the compiler will be happy
to auto-vectorize -- and that's I think where it counts most.

>> Libraries OTOH are more components to me rather than tools. Maybe I'm
>> being too picky with terms, but if you meant libraries to be tools,
>> then I feel that simply doesn't work with how my brain is wired. It
>> doesn't make you wrong, but it just doesn't feel right to me. ;-)
> Beware that Embedded DSL are nto more than DSL in disguise
> inside a library, hence the confusion I keep between tools and libraries.

I would tend to agree with Dave that libraries tend to be disguised as
DSEL's (think Spirit) that perform a certain function -- thus the way
I think about them as components that work as part of a bigger whole.

>> I think I understand what you mean, but I don't think it's a failure
>> of the libraries that they're not known/used by the people doing the
>> programming. Much like how you can't blame the nailgun if a carpenter
>> didn't know about it that's why he's still using traditional hammers
>> and nails.
> My point was : it is not that easy to say to people "use X".

Actually, it's easy to say it -- it's a matter of acceptance that's a
problem. Now if it was a library that forced users to change their
code just to be able to leverage something that the compiler should be
able to handle for them (like writing assembly code for instance)
sounds to me like too much to ask for. After all, the reason we have
higher level programming languages is to hide from ourselves the
details of the assembly/machine language on which platform we're going
to run programs on. ;-)

>> True, but libraries also require that users write code that actually
>> use the library. If the users already had code that didn't use your
>> library, how is it an advantage if they can get the auto-vectorization
>> from a future version of a compiler anyway without having to butcher
>> their code to use your library? And what if they find a bug in the
>> code using the library or (god forbid) find a bug in the library?
> Same can be said for any library out there. What if tomorrow new C++
> compiler will extract code from source and built top-notch threads from it ?
> Should we prevent people to use Boost.Threads from now ?

No, what I'm pointing at here is that libraries for considerably very
low level parallelism will have to be maintained independent of the
code that's actually using it -- and thus another layer on which
failure can be found and inefficiencies introduced. The point for
using Boost.Thread instead of platform-specific-threading-library is
so that you can rely on a coherent interface for specifically
threading and synchronization among threads.

If that new C++ compiler is able to do that parallelism for us
effectively without us having to use Boost.Threads, then I think
slowly usage of Boost.Threads would go down on its own. However I
think the problem that Boost.Threads is solving is compelling enough
to be a viable solution in the interim.

The point I'm trying to make is that if the target is simply just SIMD
at the processor level, I'd think a library just for that is too
specific to be considerably generic. I might be missing the point
here, but if the compiler can already do it now (and will only get
better in the future) and that I can write specific code for the
platform even with C++ through compiler-vendor-provided libraries (if
I needed to be specific about what I wanted to do with the compiler
and the platform) if I didn't want to rely on a compiler to do it for
me, what would be the value of a very narrow/specific library like a
SIMD-specific thingamagig?

>> Actually, DSELs require that you write code in the domain language --
>> and here is where the problem lies
> Well, if parallelism is outsourced behind the scene, it's not a problem.

But then you (the DSEL writer) for that specific domain would have to
deal with parallelism the old-fashioned way without a DSEL for that
(yet) helping you to do it -- and that doesn't scale. That doesn't
help the domain expert especially if he doesn't know that he can
actually come up with solutions that do leverage the parallelism
available in his platform.

>> If this were the case then maybe just having this DSEL may be good to
>> give to parallelism-savvy C++ programmers, but not necessarily still
>> the domain experts who will be doing the writing of the
>> domain-specific logic. Although you can argue that parallel
>> programming is a domain in itself, in which case you're still not
>> bridging the gap between those that know about parallel programming
>> and the other domain experts.
> Parallel programming is a domain in itself but not a domain for user but for
> tool writer.
> A user domain si things like math, finance, physics, anything. We agree
>> Yes, not all platforms are Intel platforms, but I don't know if you've
>> noticed yet that Intel compilers even create code that will run on AMD
>> processors -- yes, even SSE[0..3] -- as per their product
>> documentation. If your target is CotS machines, I think Intel/GCC is
>> your best bet (at least in x84_64). I haven't dealt with other
>> platforms though aside from Intel/AMD, but it's not unreasonable to
>> think that since everybody's moving the direction of leveraging and
>> exploiting parallelism in hardware, that the compiler vendors will
>> have to compete (and eventually get better) in this regard.
> Well, we can't let Altivec and its offspring on the side of the road. Cell
> processor use it and I
> considering the Cell as a simili-COtS as a PS3 cost something like only half
> a kidney.
> I don't target COTS or not-CotS, my goal is cover the basics and the SIMD
> absics involves old Motorola enabled PPC and Intel machines.
> So the strict minimum is Altivec+SSE flavors. I hope that one day (AVX v2),
> both will converge tough.

And precisely because of that is why I think better compilers that
leverage these platform-specific features would be the correct and
far-reaching solution than a library just for SIMD. If your goal was a
library/DSEL for expressing parallelism in general in C++ hiding the
details of threads and whatnot only which a SIMD-specific extension
would be part of, then I wouldn't feel like the goal is a little too

>> Why do I get the feeling that you're saying:
>> compiler writing != software engineering
>> ? :-P
> No I mean that *I* feel more confortabel writing stuff on this side of the
> compiler than on the other ;)

Okay. :-)

>> Anyway, I think if you're looking to contribute to a compiler-building
>> community, GCC may be a little too big (I don't want to use the term
>> advanced, because I haven't bothered looking at the code of the GCC
>> project) but I know Clang over at LLVM are looking for help to finish
>> the C++ implementation of the compiler front-end. From what I'm
>> reading with Clang and LLVM, it should be feasible to write
>> language-agnostic optimization algorithms/implementations just dealing
>> with the LLVM IR.
> Well, as I work like half a miel from Albert Cohen office, I'll certainly
> have a discussion about Clang someday ;)
> The C++->C++ tools is on my todo task list, but not for now as I think DSEL
> in C++ still have untapped ressources.

I agree, but if you're going to tackle the concurrency problem through
a DSEL, I'd think a DSEL at a higher level than SIMD extensions would
be more fruitful. For example, I'd think something like:

vector<huge_numbers> numbers;
// populate numbers
async_result_stream results =
  apply(numbers, [... insert funky parallelisable lambda construction ...])
while (results) {
  huge_number a;
  results >> a;
  cout << a << endl;

Would be able to spawn thread pools, launch tasks, and provide an
interface to getting the results using futures underneath. The domain
experts who already know C++ will be able to express their funky
parallelisable lambda construction and just know that when they use
the facility it will do the necessary decomposition and parallelism as
much as it can at the library level. This I think is something that is
feasible (although a little hard) to achieve -- and to think that the
compiler will even be able to vectorize an inner loop in the
decomposed lambda construction, that detail isn't even necessarily
dealt with by the library.

>> In that case, I think that kind of library (DSEL) would be nice to
>> have -- especially to abstract the details of expressing parallelism
>> in general a the source code level.
> Except it is like ... friggin hard ?

Uh, yes. ;-)

> My stance is to have applciation domain specific library that hide all
> parallelism tasks by
> relying on small scale parallel library themselves like Thread or my
> proposition.

In which case I think that DSEL for parallelism would be much more
acceptable than even the simplest SIMD DSEL mainly because I'd think
if you really wanted to leverage SIMD by hand, you'd just use the
vector registers and use the vector functions directly from your code
instead. At least that's in my case as both a user and a library

>> Nice! I would agree that something like *that* is appropriate as a
>> domain-specific language which leverages parallelism in the details of
>> the implementation.
>> I however think also that there are some details that would be nice to
>> tackle at the appropriate layer -- SIMD code construction is, well,
>> meant to be at the domain of the compiler (as far as SSE or similar
>> things go). OpenCL is meant to be an interface the hardware and
>> software vendors are moving towards supporting for a long time coming
>> (at least what I'm reading from the press releases) so I'm not too
>> worried about the combinatorial explosion of architectures and
>> parallelism runtimes.
> Except some people (like one of the poster in the previosu thread) daily
> deals with code that need this level of abstraction and not more.
> Hence the rationale behind "Boost.SIMD"

In which case I think a DSEL is clever, but a SIMD-only library would
be too small in scope for my taste. But that's just me I think. ;-)

>> I agree completely, but I'm afraid if the DSEL is for expressing
>> parallelism in C++, the goal of "giving domain experts tools that knew
>> about the parallelism" wouldn't be met readily. For C++ developers
>> that want to leverage parallelism in general sure, but I don't think
>> I'd be particularly compelled to use a SIMD-only DSEL.
> I think we can't just wake up and say "ok today I just sole the parallelism
> problem in C++ using DSEL".
> I think that, on the contrary, a concrete, reasonable roadmap would be
> "tiling" the parallel problem world by small scale software solution that
> can inter-operate and interact freely. Then when the basic blocks of such
> tools have been done, we can start cementing them into higher one.

Of course maybe not in a day. But it can feasibly be achieved with
some effort from brilliant library writers.

I like thinking at a higher level first and solving the problems in
the lower level with more specific focus but within a bigger context.
Once you can recognize the patterns in the solution from a higher
level can you really try solving problems at a lower level with better
insight. Missing context is always hard to deal with.

They came up with the STL anyway right, whoever thought there'd be a
string class that makes sense in C++. ;-)

Dean Michael C. Berris
Software Engineer, Friendster, Inc.

Boost list run by bdawes at, gregod at, cpdaniel at, john at