Boost logo

Boost :

From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2008-05-22 20:30:25


With the following code I am unable to induce the compiler to ever fail
to inline and produce a symbol in the symbol table. It doesn't matter
if you pass by const ref, or what you name the function if it is always
inlined and that stuff is just compiled out of existence. I think this
whole thread may be solving a problem that doesn't exist. If the
compiler can succeed to inline all N recursive function calls it doesn't
matter whether it is 4, or 8 or what they are named. In the end what
(might) matter is whether you have many or few multiplication operations
(with the split recursive algorithm.) That same algorithm can be
implemented as a loop in a single function that that takes the exponent
by value as a runtime argument and the compiler would still have the
opportunity to unroll the loop when the exponent is a constant where the
function is inlined. I tried that and the compiler (gcc 4.3.2 -O2)
fails to unroll the loop upon inspection of the assembly. I also
inspected the assembly generated for my template and noticed something
else that is interesting. The compiler takes the O(N) multiplications
my lazily implemented positive_power template generates and optimizes
them upon instantiation and inlining for N=21 to only 6 multiplications,
meaning that the compiler is actually rewriting the algorithm for you to
reduce the number of multiply operations. In this thread we have just
been speculating about how best to do for the compiler what it is
perfectly capable of doing on its own.

I vote we leave the pow template alone, since it isn't broken and
doesn't need fixing. (Yes I know my vote doesn't count.)

Thanks,
Luke

//recursive template version

template <int N>
struct positive_power {
        template <typename T>
        static T result(T base) {
                return positive_power<N-1>::result(base) * base;
        }
};
template <>
struct positive_power<0> {
        template <typename T>
        static T result(T base) {
                return 1;
        }
};
template <>
struct positive_power<1> {
        template <typename T>
        static T result(T base) {
                return base;
        }
};

int main(int argc, char** argv) {
        std::cout << positive_power<21>::result(argc);

//assembly generated (note only 6 multiplies)
main: 0x89 movl %edi,%esi
     0x004007a2: 0x48 subl $8,%rsp
     0x004007a6: 0x0f imull %edi,%esi
     0x004007a9: 0x0f imull %esi,%esi
     0x004007ac: 0x0f imull %edi,%esi
     0x004007af: 0x0f imull %esi,%esi
     0x004007b2: 0x0f imull %esi,%esi
     0x004007b5: 0x0f imull %edi,%esi
     0x004007b8: 0xbf movl $0x600c20,%edi

//loop version

inline int pow(int base, int exponent) {
        if(exponent == 0) return 1;
        int result = base;
        for(int i = exponent; i > 1; i /= 2) {
                result *= result;
                if(i % 2) result *= base;
        }
        return result;
}

int main(int argc, char** argv) {
        std::cout << pow(2, 8) << std::endl;

//assembly (note one multiply in a loop)
main: pushl %r13
   0x004008a2: movl $8,%ecx
   0x004008a7: movl $3,%edx
   0x004008ac: sarl 1,%ecx
   0x004008ae: movl $4,%esi
   0x004008b3: pushl %r12
   0x004008b5: pushl %rbp
   0x004008b6: pushl %rbx
   0x004008b7: subl $264,%rsp
   0x004008be: subl $1,%edx
   0x004008c1: je 0x4008d6
   0x004008c3: imull %esi,%esi
   0x004008c6: testb $1,%cl
   0x004008c7: roll $141,(%rcx)
   0x004008ca: addb $54,%al
   0x004008cc: cmovne %eax,%esi
   0x004008cf: sarl 1,%ecx
   0x004008d1: subl $1,%edx
   0x004008d4: jne 0x4008c3
   0x004008d6: movl $0x600e20,%edi


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk