Boost logo

Boost :

From: Fernando Luis Cacciola Carballal (fcacciola_at_[hidden])
Date: 2000-05-31 10:05:42


Tomas Holestein:

> I don't think I like this. After all, a compiler is expected to do
> exactly that if you say it to unroll the loops. And he probably
knows
> better which loops to unroll.
Initially I looked at the optimized code generated by Borland C++
Builder 4.0 (WIN).
It didn't unroll the loops as I've expected.
Anyway, I agree that this sort of tricks should be performed by the
compiler, I just don't trust brand compilers this far yet.

> If you explicitly unroll the loops this way, you may have a large
code
> bloat, especially if you have fixedarrays of different sizes.
>A collegue of mine suggested to have a unrolling loop template for
all
>functions, so may be making a function would make your approach much
>easier anyhow.

Well, the core of the techinque is the template arrayops<> (which
stands for array operations).
Whith this template you can unroll loops in a controlled way:

* Add the following function to arrayops<>

  template<typename F>
  static inline void transform ( T * a , F func )
  {
    unrolled<size-1,T>::transform(a,func);
    a[size-1]=func(a[size-1]);
  }

* Add the following to arrayops<0>:

  template<typename F> static inline void transform ( T * , F ) {}

Now you can write:

  int A[6]={1,2,3};
  arrayops<3,int>::transform ( A , negate ) ;

Dave Abrahams:

>Finally, why didn't you define global
>operators for +/-, I wonder?
I intended to do it, but I think it's not possible. The reason is
that those operators are functions with a return value. I couldn't
figure out how to write the unrolling template for a function that
returns a value.
That is, when you write:

 int A[3]={1,2,3};
 int B[3]={10,20,30};
 int C[3];
 arrayops<3,int>::add ( A , B ,C ) ;

The template is instantiated and the compiled code is *exactly* the
same as:
(Notice the order of the subscripting)

 int A[3]={1,2,3};
 int B[3]={10,20,30};
 int C[3];
 C[2]=A[2]+B[2];
 C[1]=A[1]+B[1];
 C[0]=A[0]+B[0];

So each 'recursive function' in the template is not really a function
but a code generator. It can't return a value.

Now, how would something like and operator + would look like? I've no
idea.

Anyway, I've concluded that perhaps arrayops<> is the key stuff here.
The array<> (or fixedarray<>) are just possible uses for this.
In this respect, I propose the following:

Add a template unrolled<> (which is a better name than arrayops<>)
with the functions I already presented and some other loop-oriented
functions such as transform,etc...

template<int size,typename T>
struct unrolled
{
  static inline void fill ( T * a , const T & val )
  {
    unrolled<size-1,T>::fill(a,val);
    a[size-1]=val;
  }
  static inline void copy ( const T * src , T * dest )
  {
    unrolled<size-1,T>::copy(src,dest);
    dest[size-1]=src[size-1];
  }
  static inline void add ( const T * a , const T * b , T * c )
  {
    unrolled<size-1,T>::add(a,b,c);
    c[size-1]=a[size-1] + b[size-1];
  }
  static inline void sub ( const T * a , const T * b , T * c )
  {
    unrolled<size-1,T>::sub(a,b,c);
    c[size-1]=a[size-1] - b[size-1];
  }
  static inline void mul ( const T * a , const T * b , T * c )
  {
    unrolled<size-1,T>::mul(a,b,c);
    c[size-1]=a[size-1] * b[size-1];
  }
  static inline void div ( const T * a , const T * b , T * c )
  {
    unrolled<size-1,T>::div(a,b,c);
    c[size-1]=a[size-1] / b[size-1];
  }

  template<typename F>
  static inline void transform ( T * a , F func )
  {
    unrolled<size-1,T>::transform(a,func);
    a[size-1]=func(a[size-1]);
  }

  // extra functions here...
} ;

template<typename T>
struct unrolled<0,T>
{
  static inline void fill ( T * , const T & ) {}
  static inline void copy ( const T * , T * ) {}
  static inline void add ( const T * , const T * , T * ) {}
  static inline void sub ( const T * , const T * , T * ) {}
  static inline void mul ( const T * , const T * , T * ) {}
  static inline void div ( const T * , const T * , T * ) {}
  template<typename F> static inline void transform ( T * , F ) {}
} ;

Thanks.

Fernando Cacciola
fcacciola_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk