|
Boost : |
From: Chris Fairles (chris.fairles_at_[hidden])
Date: 2008-07-01 16:19:36
On Mon, Jun 30, 2008 at 10:27 AM, Stephen Nuchia <snuchia_at_[hidden]> wrote:
> Loop unrolling, per se, is not all that hard. But in combination with
> software pipelining and/or manual vectorization it can be a challenge to
> get it right. A template library could conceivable help. I generally
> use preprocessor macros in such situations but it quickly gets very ugly
> and very hard to maintain.
>
> However, the performance that can be gained by using such a library is
> largely a temporary thing. Compilers keep getting better and, in most
> cases, simply saying what you mean will allow a good compiler to make
> the necessary optimizations. If your compiler isn't that smart today
> chances are it will be tomorrow. So I wouldn't invest a whole lot of
> work in writing such a library nor in converting code to use it.
>
> Of course, even temporary wins have some value. If work does go ahead
> on this I will watch with interest and contribute when I can. On real
> code with real data I've achieved 60-70% speedups by manually unrolling,
> pipelineing, and vectorizing critical loops. The biggest problems with
> this approach, the factors that limit how much it gets done, are:
>
> 1) It takes a rocket scientist to make the code transformations safely
> and it takes a lot of low-level insight to know where the
> transformations are likely to help.
> 2) The performance gains are very sensitive to the microarchitecture of
> the execution platform so it is hard to justify for commercial software
> (that runs on a variety of platforms without recompilation)
> 3) The transformed code becomes essentially maintenance-proof.
> 4) To evaluate the effectiveness of the proposed transformation you
> really want to test a fairly large number of alternative
> transformations. Each one is very tedious to do: degree of unrolling
> .vs. alternative interleavings of pipeline stages .vs. data
> representation alternatives .vs. ... You can't test them all.
>
> A template library that successfully abstracted some of the mechanics of
> these transformations would, at a minimum, make it feasible to evaluate
> more alternatives on more microarchitectures and reduce the maintenance
> penalty of implementing the transformations. If it also made it
> feasible to deploy multiple runtime-selected implementations of a
> function from one source text that would be very interesting.
>
> -swn
>
>
> -----Original Message-----
> From: Vladimir Prus [mailto:vladimir_at_[hidden]]
> Sent: Sunday, June 29, 2008 1:15 AM
> To: boost_at_[hidden]
> Subject: Re: [boost] Anyone interested in a generic loop (and
> loopunrolling) library ?
>
> Hui Li wrote:
>
>> Loops, in particular loop-unrolling, can be made generic, easy to
> write, and
>> accessible to everybody. With the help of Boost.Lambda (and a few
> other
>> boost libraries), we can easily construct arbitrary loops that are
> unrolled
>> at compile-time.
>
> Can you clarify exactly what you're trying to achieve? I don't think
> that
> loops, in general, and hard-to-write and not accessible to everybody.
> And
> speaking about loop unrolling -- the whole point of that is performance
> --
> did you measure performance of the code using your approach and
> traditional
> code with suitable optimizations?
>
> - Volodya
>
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>
I would certainly be interested (in helping development as well). For
what its worth, I c++0x-ified your code (more for demonstration, but
still works well ... except range linking).
template < typename ...Ranges>
struct link_range;
template < typename LeftRange, typename... Ranges>
struct link_range<LeftRange, Ranges...>
{
static const int first = LeftRange::first;
typedef typename boost::mpl::if_ <
typename boost::is_same <
typename LeftRange::next_range, empty_range >::type,
typename link_range<Ranges...>::next_range,
link_range < typename LeftRange::next_range, Ranges... >
>::type next_range;
};
template < typename LeftRange, typename RightRange >
struct link_range<LeftRange, RightRange>
{
static const int first = LeftRange::first;
typedef typename boost::mpl::if_ <
typename boost::is_same <
typename LeftRange::next_range, empty_range >::type,
RightRange,
link_range < typename LeftRange::next_range, RightRange >
>::type next_range;
};
// implementation of the unrolled loop over a range
template < typename LambdaExpr, typename Range >
struct loop_Impl
{
template < typename ...V1 >
void operator()(LambdaExpr& expr, V1&... v1)
{
expr(v1[Range::first]...);
loop_Impl < LambdaExpr, typename Range::next_range >()(expr,v1...);
}
};
// looping over an empty range does nothing
template < typename LambdaExpr >
struct loop_Impl < LambdaExpr, empty_range >
{
template < typename ...V1 >
void operator()(LambdaExpr&, V1...){}
};
// helper functions
template < typename Range, typename LambdaExpr, typename ...V1 >
void loop_over(LambdaExpr& expr, V1&... v1)
{
loop_Impl<LambdaExpr,Range>()(expr,v1...);
}
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk