Subject: Re: [boost] [OT?] SIMD and Auto-Vectorization (was Re: How to structurate libraries ?)
From: David A. Greene (greened_at_[hidden])
Date: 2009-01-20 17:03:42
On Monday 19 January 2009 19:01, Patrick Mihelich wrote:
> IMO, waiting for compiler technology is neither pragmatic in the short-term
> nor (as I argued in the other thread) conceptually correct. If you look at
> expression-template based linear algebra libraries like uBlas and Eigen2,
> these are basically code generation libraries (if compilers were capable of
> complicated loop fusion optimizations, we might not need such libraries at
Good compilers do a good amount of loop fusion. They can't get all cases,
certainly, but I would encourage you to explore what's already being done.
gcc is not a good example.
> all). Given an expression involving vectors, it's fairly mechanical to
> transform it directly into optimal assembly. Whereas at the level of
> optimizing IR code, reasoning about pointers and loops is rather
> complicated. Are the pointers 16-byte aligned? Does it make sense to
The alignment issue is becoming less of a problem on newer architectures.
Barcelona, for example, doesn't care. It's entirely appropriate to fix these
kinds of problems in the hardware.
> partially unroll the loop to exploit data parallelism? What transformations
> can (and should) we make when traversing a matrix in a double loop? There
Good optimizing compilers do lots of transformations. Intercfhange,
unswitching, unroll-and-jam, collapse, coalesce, etc. etc. etc. No, they
are not perfect. But it strikes me the answer is not to do SIMD codegen by
hand. What would be more interesting is a general way to convey the
information the compiler needs. Typically this is done with vendor-specific
pragmas but there are all kinds of tricks one could imagine might be able
to help the compiler that are expressable directly in C++.
Actually, it would be quite interesting to see what we could convey to the
compiler through judicious use of "pointless" C++ code.
> The fact of the matter is that compilers do not generate optimal
> SIMD-accelerated code except in the simplest of cases, and so we end up
> using SIMD intrinsics by hand. Frankly I don't expect this to change
Define "optimal." A compiler will not generate "optimal" code in most cases
because the compiler is general purpose. Some transformations that benefit a
specific piece of code can be disastrous on another. But usually these
decisions don't happen in code generation. They happen in the optimizer
where loop transformations are scheduled. If we're going to help the
compiler, this is where we need to do it. Some compilers have ways to force
specific transformations to be done. IMHO more compilers need to provide
A SIMD library is too narrowly focused. In a decade another technology will
come along. Will we need another DSEL or library for that one? Why not just
tell the compiler what it wants to know?
In any event, no one is stopping anyone from creating a SIMD DSEL. I believe
it's the wrong approach but then my career doesn't depend on optimal graphics
code. In certain specialized cases it may well be worth it.
> dramatically anytime soon; I'm no compiler expert, but my impression is
> that some complicated algebraic optimizations (for which C++ is not very
> suited) are necessary.
Algebraic manipulations are no problem. Pointers and aliasing are
problematic. Generous use of "restrict" can make a huge difference. One of
the major problems I see day in and day out is developers "helping" the
compiler by pre-linearizing addresses, etc. As a wise man once said, "Avoid
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk