|
Boost : |
From: Fernando Luis Cacciola Carballal (fcacciola_at_[hidden])
Date: 2000-05-31 10:05:42
Tomas Holestein:
> I don't think I like this. After all, a compiler is expected to do
> exactly that if you say it to unroll the loops. And he probably
knows
> better which loops to unroll.
Initially I looked at the optimized code generated by Borland C++
Builder 4.0 (WIN).
It didn't unroll the loops as I've expected.
Anyway, I agree that this sort of tricks should be performed by the
compiler, I just don't trust brand compilers this far yet.
> If you explicitly unroll the loops this way, you may have a large
code
> bloat, especially if you have fixedarrays of different sizes.
>A collegue of mine suggested to have a unrolling loop template for
all
>functions, so may be making a function would make your approach much
>easier anyhow.
Well, the core of the techinque is the template arrayops<> (which
stands for array operations).
Whith this template you can unroll loops in a controlled way:
* Add the following function to arrayops<>
template<typename F>
static inline void transform ( T * a , F func )
{
unrolled<size-1,T>::transform(a,func);
a[size-1]=func(a[size-1]);
}
* Add the following to arrayops<0>:
template<typename F> static inline void transform ( T * , F ) {}
Now you can write:
int A[6]={1,2,3};
arrayops<3,int>::transform ( A , negate ) ;
Dave Abrahams:
>Finally, why didn't you define global
>operators for +/-, I wonder?
I intended to do it, but I think it's not possible. The reason is
that those operators are functions with a return value. I couldn't
figure out how to write the unrolling template for a function that
returns a value.
That is, when you write:
int A[3]={1,2,3};
int B[3]={10,20,30};
int C[3];
arrayops<3,int>::add ( A , B ,C ) ;
The template is instantiated and the compiled code is *exactly* the
same as:
(Notice the order of the subscripting)
int A[3]={1,2,3};
int B[3]={10,20,30};
int C[3];
C[2]=A[2]+B[2];
C[1]=A[1]+B[1];
C[0]=A[0]+B[0];
So each 'recursive function' in the template is not really a function
but a code generator. It can't return a value.
Now, how would something like and operator + would look like? I've no
idea.
Anyway, I've concluded that perhaps arrayops<> is the key stuff here.
The array<> (or fixedarray<>) are just possible uses for this.
In this respect, I propose the following:
Add a template unrolled<> (which is a better name than arrayops<>)
with the functions I already presented and some other loop-oriented
functions such as transform,etc...
template<int size,typename T>
struct unrolled
{
static inline void fill ( T * a , const T & val )
{
unrolled<size-1,T>::fill(a,val);
a[size-1]=val;
}
static inline void copy ( const T * src , T * dest )
{
unrolled<size-1,T>::copy(src,dest);
dest[size-1]=src[size-1];
}
static inline void add ( const T * a , const T * b , T * c )
{
unrolled<size-1,T>::add(a,b,c);
c[size-1]=a[size-1] + b[size-1];
}
static inline void sub ( const T * a , const T * b , T * c )
{
unrolled<size-1,T>::sub(a,b,c);
c[size-1]=a[size-1] - b[size-1];
}
static inline void mul ( const T * a , const T * b , T * c )
{
unrolled<size-1,T>::mul(a,b,c);
c[size-1]=a[size-1] * b[size-1];
}
static inline void div ( const T * a , const T * b , T * c )
{
unrolled<size-1,T>::div(a,b,c);
c[size-1]=a[size-1] / b[size-1];
}
template<typename F>
static inline void transform ( T * a , F func )
{
unrolled<size-1,T>::transform(a,func);
a[size-1]=func(a[size-1]);
}
// extra functions here...
} ;
template<typename T>
struct unrolled<0,T>
{
static inline void fill ( T * , const T & ) {}
static inline void copy ( const T * , T * ) {}
static inline void add ( const T * , const T * , T * ) {}
static inline void sub ( const T * , const T * , T * ) {}
static inline void mul ( const T * , const T * , T * ) {}
static inline void div ( const T * , const T * , T * ) {}
template<typename F> static inline void transform ( T * , F ) {}
} ;
Thanks.
Fernando Cacciola
fcacciola_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk