From: Stephen Nuchia (snuchia_at_[hidden])
Date: 2008-06-30 10:27:41
Loop unrolling, per se, is not all that hard. But in combination with
software pipelining and/or manual vectorization it can be a challenge to
get it right. A template library could conceivable help. I generally
use preprocessor macros in such situations but it quickly gets very ugly
and very hard to maintain.
However, the performance that can be gained by using such a library is
largely a temporary thing. Compilers keep getting better and, in most
cases, simply saying what you mean will allow a good compiler to make
the necessary optimizations. If your compiler isn't that smart today
chances are it will be tomorrow. So I wouldn't invest a whole lot of
work in writing such a library nor in converting code to use it.
Of course, even temporary wins have some value. If work does go ahead
on this I will watch with interest and contribute when I can. On real
code with real data I've achieved 60-70% speedups by manually unrolling,
pipelineing, and vectorizing critical loops. The biggest problems with
this approach, the factors that limit how much it gets done, are:
1) It takes a rocket scientist to make the code transformations safely
and it takes a lot of low-level insight to know where the
transformations are likely to help.
2) The performance gains are very sensitive to the microarchitecture of
the execution platform so it is hard to justify for commercial software
(that runs on a variety of platforms without recompilation)
3) The transformed code becomes essentially maintenance-proof.
4) To evaluate the effectiveness of the proposed transformation you
really want to test a fairly large number of alternative
transformations. Each one is very tedious to do: degree of unrolling
.vs. alternative interleavings of pipeline stages .vs. data
representation alternatives .vs. ... You can't test them all.
A template library that successfully abstracted some of the mechanics of
these transformations would, at a minimum, make it feasible to evaluate
more alternatives on more microarchitectures and reduce the maintenance
penalty of implementing the transformations. If it also made it
feasible to deploy multiple runtime-selected implementations of a
function from one source text that would be very interesting.
From: Vladimir Prus [mailto:vladimir_at_[hidden]]
Sent: Sunday, June 29, 2008 1:15 AM
Subject: Re: [boost] Anyone interested in a generic loop (and
loopunrolling) library ?
Hui Li wrote:
> Loops, in particular loop-unrolling, can be made generic, easy to
> accessible to everybody. With the help of Boost.Lambda (and a few
> boost libraries), we can easily construct arbitrary loops that are
> at compile-time.
Can you clarify exactly what you're trying to achieve? I don't think
loops, in general, and hard-to-write and not accessible to everybody.
speaking about loop unrolling -- the whole point of that is performance
-- did you measure performance of the code using your approach and traditional code with suitable optimizations? - Volodya
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk