# Boost :

From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2008-05-22 20:30:25

With the following code I am unable to induce the compiler to ever fail
to inline and produce a symbol in the symbol table. It doesn't matter
if you pass by const ref, or what you name the function if it is always
inlined and that stuff is just compiled out of existence. I think this
whole thread may be solving a problem that doesn't exist. If the
compiler can succeed to inline all N recursive function calls it doesn't
matter whether it is 4, or 8 or what they are named. In the end what
(might) matter is whether you have many or few multiplication operations
(with the split recursive algorithm.) That same algorithm can be
implemented as a loop in a single function that that takes the exponent
by value as a runtime argument and the compiler would still have the
opportunity to unroll the loop when the exponent is a constant where the
function is inlined. I tried that and the compiler (gcc 4.3.2 -O2)
fails to unroll the loop upon inspection of the assembly. I also
inspected the assembly generated for my template and noticed something
else that is interesting. The compiler takes the O(N) multiplications
my lazily implemented positive_power template generates and optimizes
them upon instantiation and inlining for N=21 to only 6 multiplications,
meaning that the compiler is actually rewriting the algorithm for you to
reduce the number of multiply operations. In this thread we have just
been speculating about how best to do for the compiler what it is
perfectly capable of doing on its own.

I vote we leave the pow template alone, since it isn't broken and
doesn't need fixing. (Yes I know my vote doesn't count.)

Thanks,
Luke

//recursive template version

template <int N>
struct positive_power {
template <typename T>
static T result(T base) {
return positive_power<N-1>::result(base) * base;
}
};
template <>
struct positive_power<0> {
template <typename T>
static T result(T base) {
return 1;
}
};
template <>
struct positive_power<1> {
template <typename T>
static T result(T base) {
return base;
}
};

int main(int argc, char** argv) {
std::cout << positive_power<21>::result(argc);

//assembly generated (note only 6 multiplies)
main: 0x89 movl %edi,%esi
0x004007a2: 0x48 subl \$8,%rsp
0x004007a6: 0x0f imull %edi,%esi
0x004007a9: 0x0f imull %esi,%esi
0x004007ac: 0x0f imull %edi,%esi
0x004007af: 0x0f imull %esi,%esi
0x004007b2: 0x0f imull %esi,%esi
0x004007b5: 0x0f imull %edi,%esi
0x004007b8: 0xbf movl \$0x600c20,%edi

//loop version

inline int pow(int base, int exponent) {
if(exponent == 0) return 1;
int result = base;
for(int i = exponent; i > 1; i /= 2) {
result *= result;
if(i % 2) result *= base;
}
return result;
}

int main(int argc, char** argv) {
std::cout << pow(2, 8) << std::endl;

//assembly (note one multiply in a loop)
main: pushl %r13
0x004008a2: movl \$8,%ecx
0x004008a7: movl \$3,%edx
0x004008ac: sarl 1,%ecx
0x004008ae: movl \$4,%esi
0x004008b3: pushl %r12
0x004008b5: pushl %rbp
0x004008b6: pushl %rbx
0x004008b7: subl \$264,%rsp
0x004008be: subl \$1,%edx
0x004008c1: je 0x4008d6
0x004008c3: imull %esi,%esi
0x004008c6: testb \$1,%cl
0x004008c7: roll \$141,(%rcx)