Boost logo

Boost :

Subject: Re: [boost] Back to Boost.SIMD - Some performances ...
From: Joel Falcou (joel.falcou_at_[hidden])
Date: 2009-03-26 16:15:02


Stefan Seefeld a écrit :
> Joel Falcou wrote:
>> Michael Fawcett a écrit :
>>> Joel, how does the extension detection mechanism work? Is there as
>>> mall runtime penalty for each function as it detects which path
>>> would be optimal, or can you define at compile-time what extensions
>>> are available (e.g. if you are compiling for a fixed hardware
>>> platform, like a console).
>> I have a #ifdef/#elif structure that detects which extension have
>> been set up ont he compiler and I match this with a platform
>> detection to know where to jump and how to overload some functions or
>> class definition.
>>
>> I tried the runtime way and it was fugly slow. So I'm back to a
>> compile-time detection as performance was critical.
>>
> Actually, I would expect this to be a mix of runtime and compile-time
> decision. While there are certainly things that can be decided at
> compile-time (architecture, available extensions, data types), there are
> also parameter that are only available at runtime, such as alignment,
> problem size, etc.
Well, again, the grain here is the data pack, aka generalized SIMD vector.

> In Sourcery VSIPL++ (http://www.codesourcery.com/vsiplplusplus/) we use
> a dispatch mechanism that allows programmers to chain extension
> 'evaluators' in a type-list, and this type-list is then walked over once
> by the compiler to eliminate unavailable matches, and the resulting list
> at runtime to find a match based on the above runtime parameters. This
> is also where we parametrize for what sizes we want to dispatch to a
> given backend (for example if the performance gain outmatches the data
> I/O penalty, etc.).
>
> Obviously, all this wouldn't make sense on a very fine-grained level.
> But for typical blas-level or signal-processing operations (matrix
> multiply, FFT, etc.) this works like a charm.
>
> (We target all sorts of hardware, from clusters over Cell processors
> down to GPUs.)
That's what we do in NT2, mixed CT/RT selectors

-- 
___________________________________________
Joel Falcou - Assistant Professor
PARALL Team - LRI - Universite Paris Sud XI
Tel : (+33)1 69 15 66 35

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk