|
Boost : |
Subject: Re: [boost] SIMD libraries -- was: [xint] Fourth release, requesting preliminary review again
From: joel falcou (joel.falcou_at_[hidden])
Date: 2010-06-12 02:24:59
Simonson, Lucanus J wrote:
> Obviously something like valarray can be implemented. It has been part of the standard for years, though most people don't implement it, use it or even know of its existence. There is a difference between that and what I was talking about, which is a DSEL that exposes all of the features of a SIMD instruction set.
I'm talking about a DSEL for vector register, not a DSEL pof array with
SIMD behavior.
> It is doable for one IS, and probably for one family, but to generalize and unify across SIMD instruction sets I'm more than skeptical.
Well, the thign is that you don't have. You only generalize the thing
that make sense and if you look at it, you'll see you can easily
target a large part. Then you can encapsulate the extension specific
behaviors into extensions specific function.
> Since most software targets a platform specifically it is almost a moot point, however. There is very little room between people who want to target a wide variety of hardware and care enough about performance to want to use SIMD and similar people who care so much about performance that they want to write all their code by hand in assembly for each target hardware. I expect GMP falls solidly in the second category, for example.
Why again assembly. C intrinsic are perfectly fine. We're not in 1492
anymore ...
> There is a big difference between using SSE instructions and getting the most out of them.
Well , you can look here:
for some performances. We are basically at 80-90% of maximumtheoric
speed-up for single operations. The DSL nature of the whole thing
make composition as fast.
> In general you have to change your algorithm to fit the architecture. That means different algorithms for different architectures.
>
Except here with our appraoch, the only thign we give is basic building
block for SIMD code. When you compose them, we have a set of transform
that remap combo to current extensions idioms.
> Does your simd wrapper use expression templates to convert a * b + c to fused multiply add?
Of course, it was like the motivation of making it in the first place.
In the proto grammar of our register_<T,N> class, we have a
customization point which is specialized per extension and contain a
list of such fusing transformation.
> Does it expose the overflow/carry flags produced by arithmetic?
Currently no because our use case doesn't require them. If you have an
example for them, I'm all for havign a look.
> What about predicates?
You mean ? Handling stuff like vec_sel with proper bit mask ?
> I agree it is pretty obvious to start out from a "pack of N elements of type T", but that's just the start, not the whole echelada.
>
We are fully aware. We didn't say "hey let's make SIMD in C++"
overnight, it's a 6 years work by two people :o
> Please don't misinterpret my skepticism for criticism. I am actually pretty curious to learn more about what you did with SIMD and I'm sorry I missed boostcon this year. Discussions like this work much better in person. I've followed some of your posts in the past about NT2 and uBlas and the recent matrix library submission.
>
No problem. If you want more details we can :
1/ set up a threa dout of this one ;)
2/ give you some extra material about our publications on the matter
We're also currently working hard to get it released in a not too much
distant future so people cna actualyl play with it ;)
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk