Subject: Re: [boost] How to structurate libraries ?
From: Joel Falcou (joel.falcou_at_[hidden])
Date: 2009-01-17 04:33:21
David A. Greene a écrit :
> Ahem: http://www.cray.com
Two points :
1/ Not everyone has access to a cray-like machine. Parallelization tools
for CotS machines is not to be neglected and, on this front, lots of
thing need to be done
2/ vector supercomputer != SIMD-enabled processor even if the former may
include the later.
> Auto parallelization has been around since at least the '80's in
> production machines. I'm sure it was around even earlier than that.
What do you call auto-parallelization ?
Are you telling me that, nowaday , I can take *any* source code written
in C or C++ or w/e compile it with some compiler specifying --parallel
and automagically get a parallel version of the code ? If so, you'll
have to send a memo to at least a dozen research team (including mine)
all over the world so they can stop trying working on this problem and
move on something else. Should I also assume than each time a new
architecture comes out, those compilers also know the best way to
generate code for them ? I beg to differ, but automatic parallelization
is far from "done".
Then again, by just looking at the problem of writing SIMD code :
explain why we still get better performance for simple code when writing
SIMD code by hand than letting gcc auto-vectorize it ?
> Perhaps your SIMD library could invent convenient ways
> to express those idioms in a machine-independent way.
Well, considering the question was first about how to structure the
group of library i'm proposing,
I apologize to not having taken the time to express all the features of
those libraries. Moreover, even with a simple example, the fact that the
library hides the differences between
SSE2,SSSE3,SSE3,SSE4,Altivec,SPU-VMX and the forecoming AVX is a feature
on its own. Oh, and as specified in the former mail, the DSL take care
of optimizing fused operation so thing like FMA are detected and
replaced by the proper intrinsic when possible. Same with reduction like
min/max, operations like b*c-a or SAD on SSEx.
> Your simple SIMD expression example isn't terribly compelling. Any competent
> compiler should be able to vectorize a scalar loop that implements it
Well, sorry then to have given a simple example.
> What would be compelling is a library to express things like the Cell's
> scratchpad. Libraries to do data staging would be interesting because more
> and more processers are going to add these kinds of local memory
I don't see what you have in mind. Do you mean something like Hierarchic
Tiled Array ? or some Cell based development library ? If the later, I
don't think boost is the best home for it. As for HTA, lots of
implementation already exists, and guess what, they just do the
parallelization themselves instead of letting the computer do it.
Anyway, we'll be able to discuss the library in itself and its features
when a proper thread for it will start.
-- ___________________________________________ Joel Falcou - Assistant Professor PARALL Team - LRI - Universite Paris Sud XI Tel : (+33)1 69 15 66 35
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk