|
Boost : |
Subject: Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)
From: David A. Greene (greened_at_[hidden])
Date: 2009-01-20 21:02:50
On Tuesday 20 January 2009 18:51, David Abrahams wrote:
> >> Why are these SIMD operations different in that respect from, say, large
> >> matrix multiplications?
> >
> > A matrix multiplication is a higher-level construct. Still, most
> > compilers will pattern-match matrix multiplication to an optimal routine.
>
> Not hardly. No compiler is going to introduce register and cache-level
> blocking.
I'm not sure what you mean here. Compilers do blocking all the time.
Typically, the compiler will match a matrix multiply (and a number of other
patterns) to library code that has been pre-tuned. Typically the library code
has a number of possible paths based on the size of the matrices, etc. Some
of those paths may be blocked at several different levels. Or not, if that
gives better performance.
There's a rich amount of research going on about how to auto-tune library code
for just such purposes.
> > SIMD code generation is extremely low-level.
> > Programmers want to think in a
> > higher level.
>
> Naturally. But are the algorithms implemented by SIMD instructions
> lower-level than std::for_each or std::accumulate? If not, maybe they
> deserve to be in a library.
A library of fast routines for doing various things is quite different from
creating a whole DSEL to do SIMD code generation. A library of fast matrix
mutliply, etc. would indeed be useful.
How much does Boost want to concern itself with providing libraries tuned with
asm routines for various architectures?
It strikes me that writing these routines using gcc intrinsics wouldn't result
in optimal code on all architectures. Similarly, it seems that a DSEL to do
the same would have similar deficiencies.
When you're talking "optimal," you're setting a pretty dang high bar.
-Dave
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk