Subject: Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)
From: David A. Greene (greened_at_[hidden])
Date: 2009-01-21 16:16:31
On Wednesday 21 January 2009 01:30, Joel Falcou wrote:
> David A. Greene a écrit :
> > A library of fast routines for doing various things is quite different
> > from creating a whole DSEL to do SIMD code generation.
> How a DSEL can be different from a library still puzzle me as the basic
> definition of a DSEL is a DSL embedded into a host language as a library.
Implementing a DSEL to do code generation is a LOT more work than
simply coding a fast library in asm. If you want to generate SIMD code
for lots of libraries than a DSEL might be worth it, but I'm talking about
specialized applications here (matrix multiply, etc.).
> > A library of fast matrix mutliply, etc. would indeed be useful.
> You mean, useful like being said weeks in advance that it's useless
> cause uBlas already do it as it was said earlier ? And if, as you said
> compilers already do what it's needed, then I call this useless too
> cause we'll just wait that all compiler do the same ...
I'm talking about specific routines tuned in a way that a general-purpose
compiler would not be able to replicate. It's a very small set of codes.
> > It strikes me that writing these routines using gcc intrinsics wouldn't
> > result in optimal code on all architectures. Similarly, it seems that a
> > DSEL to do the same would have similar deficiencies.
> Except that *maybe* the DSEL take care of using the correct set of
> intrinsic depending on platform using, I don't know, architecture
> detection at compile-time ? And, IIRC the gcc intrinsic are just C like
> function over the SIMD assembly function ... so I don't how it can't ...
Then your DSEL is actually a full-blown compiler code generator. Generating
"optimal" code is a lot more than just picking instructions. You have to
allocate registers, schedule, etc. and that changes not just based on ISA but
on the implementation of that ISA provided by a particular processor.
Writing a DSEL containing all of this knowledge is much more work than just
coding the library in asm if the set of libraries is small.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk