Boost logo

Boost :

From: Chris Fairles (chris.fairles_at_[hidden])
Date: 2008-07-01 16:19:36

On Mon, Jun 30, 2008 at 10:27 AM, Stephen Nuchia <snuchia_at_[hidden]> wrote:
> Loop unrolling, per se, is not all that hard. But in combination with
> software pipelining and/or manual vectorization it can be a challenge to
> get it right. A template library could conceivable help. I generally
> use preprocessor macros in such situations but it quickly gets very ugly
> and very hard to maintain.
> However, the performance that can be gained by using such a library is
> largely a temporary thing. Compilers keep getting better and, in most
> cases, simply saying what you mean will allow a good compiler to make
> the necessary optimizations. If your compiler isn't that smart today
> chances are it will be tomorrow. So I wouldn't invest a whole lot of
> work in writing such a library nor in converting code to use it.
> Of course, even temporary wins have some value. If work does go ahead
> on this I will watch with interest and contribute when I can. On real
> code with real data I've achieved 60-70% speedups by manually unrolling,
> pipelineing, and vectorizing critical loops. The biggest problems with
> this approach, the factors that limit how much it gets done, are:
> 1) It takes a rocket scientist to make the code transformations safely
> and it takes a lot of low-level insight to know where the
> transformations are likely to help.
> 2) The performance gains are very sensitive to the microarchitecture of
> the execution platform so it is hard to justify for commercial software
> (that runs on a variety of platforms without recompilation)
> 3) The transformed code becomes essentially maintenance-proof.
> 4) To evaluate the effectiveness of the proposed transformation you
> really want to test a fairly large number of alternative
> transformations. Each one is very tedious to do: degree of unrolling
> .vs. alternative interleavings of pipeline stages .vs. data
> representation alternatives .vs. ... You can't test them all.
> A template library that successfully abstracted some of the mechanics of
> these transformations would, at a minimum, make it feasible to evaluate
> more alternatives on more microarchitectures and reduce the maintenance
> penalty of implementing the transformations. If it also made it
> feasible to deploy multiple runtime-selected implementations of a
> function from one source text that would be very interesting.
> -swn
> -----Original Message-----
> From: Vladimir Prus [mailto:vladimir_at_[hidden]]
> Sent: Sunday, June 29, 2008 1:15 AM
> To: boost_at_[hidden]
> Subject: Re: [boost] Anyone interested in a generic loop (and
> loopunrolling) library ?
> Hui Li wrote:
>> Loops, in particular loop-unrolling, can be made generic, easy to
> write, and
>> accessible to everybody. With the help of Boost.Lambda (and a few
> other
>> boost libraries), we can easily construct arbitrary loops that are
> unrolled
>> at compile-time.
> Can you clarify exactly what you're trying to achieve? I don't think
> that
> loops, in general, and hard-to-write and not accessible to everybody.
> And
> speaking about loop unrolling -- the whole point of that is performance
> --
> did you measure performance of the code using your approach and
> traditional
> code with suitable optimizations?
> - Volodya
> _______________________________________________
> Unsubscribe & other changes:

I would certainly be interested (in helping development as well). For
what its worth, I c++0x-ified your code (more for demonstration, but
still works well ... except range linking).

template < typename ...Ranges>
  struct link_range;

  template < typename LeftRange, typename... Ranges>
  struct link_range<LeftRange, Ranges...>
    static const int first = LeftRange::first;

    typedef typename boost::mpl::if_ <
      typename boost::is_same <
        typename LeftRange::next_range, empty_range >::type,
        typename link_range<Ranges...>::next_range,
        link_range < typename LeftRange::next_range, Ranges... >
>::type next_range;

  template < typename LeftRange, typename RightRange >
  struct link_range<LeftRange, RightRange>
     static const int first = LeftRange::first;

     typedef typename boost::mpl::if_ <
      typename boost::is_same <
        typename LeftRange::next_range, empty_range >::type,
        link_range < typename LeftRange::next_range, RightRange >
>::type next_range;

  // implementation of the unrolled loop over a range
  template < typename LambdaExpr, typename Range >
  struct loop_Impl
     template < typename ...V1 >
     void operator()(LambdaExpr& expr, V1&... v1)
         loop_Impl < LambdaExpr, typename Range::next_range >()(expr,v1...);

  // looping over an empty range does nothing
  template < typename LambdaExpr >
  struct loop_Impl < LambdaExpr, empty_range >
     template < typename ...V1 >
     void operator()(LambdaExpr&, V1...){}

  // helper functions

  template < typename Range, typename LambdaExpr, typename ...V1 >
  void loop_over(LambdaExpr& expr, V1&... v1)

Boost list run by bdawes at, gregod at, cpdaniel at, john at