Boost logo

Boost :

Subject: Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)
From: David A. Greene (greened_at_[hidden])
Date: 2009-01-21 16:16:31


On Wednesday 21 January 2009 01:30, Joel Falcou wrote:
> David A. Greene a écrit :
> > A library of fast routines for doing various things is quite different
> > from creating a whole DSEL to do SIMD code generation.
>
> How a DSEL can be different from a library still puzzle me as the basic
> definition of a DSEL is a DSL embedded into a host language as a library.

Implementing a DSEL to do code generation is a LOT more work than
simply coding a fast library in asm. If you want to generate SIMD code
for lots of libraries than a DSEL might be worth it, but I'm talking about
specialized applications here (matrix multiply, etc.).

> > A library of fast matrix mutliply, etc. would indeed be useful.
>
> You mean, useful like being said weeks in advance that it's useless
> cause uBlas already do it as it was said earlier ? And if, as you said
> compilers already do what it's needed, then I call this useless too
> cause we'll just wait that all compiler do the same ...

I'm talking about specific routines tuned in a way that a general-purpose
compiler would not be able to replicate. It's a very small set of codes.

> > It strikes me that writing these routines using gcc intrinsics wouldn't
> > result in optimal code on all architectures. Similarly, it seems that a
> > DSEL to do the same would have similar deficiencies.
>
> Except that *maybe* the DSEL take care of using the correct set of
> intrinsic depending on platform using, I don't know, architecture
> detection at compile-time ? And, IIRC the gcc intrinsic are just C like
> function over the SIMD assembly function ... so I don't how it can't ...

Then your DSEL is actually a full-blown compiler code generator. Generating
"optimal" code is a lot more than just picking instructions. You have to
allocate registers, schedule, etc. and that changes not just based on ISA but
on the implementation of that ISA provided by a particular processor.

Writing a DSEL containing all of this knowledge is much more work than just
coding the library in asm if the set of libraries is small.

                                         -Dave


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk