Boost :

Date view	Thread view	Subject view	Author view

From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2008-05-02 20:00:22

Next message: Simonson, Lucanus J: "[boost] GTL compile time and concepts redesign"
Previous message: Steven Watanabe: "Re: [boost] GTL compile time vs. run time accessors"
In reply to: Steven Watanabe: "Re: [boost] GTL compile time vs. run time accessors"
Next in thread: Fernando Cacciola: "Re: [boost] GTL compile time vs. run time accessors"

Steven wrote:
>Ok. There might be extra optimizations possible when the index is
known
>at compile time as opposed to run time. (I'm not talking about the
>difference
>between get<X>(p) and p[X], here, but the difference between when X is
>known at compile time using templates to avoid code duplication vs.
>runtime.
>and using function arguments). This is really a property of the
>algorithm rather
>than the point class, though.

I agree with you. It really taxes the compiler to optimize my highly
nested inline function calls and it has too much opportunity to give up
early instead of getting the job done. Switching from gcc 3.4.2 to gcc
4.2.0 resulted in about a 30% speedup in application code that relies
heavily on my types and algorithms. Compile times went up slightly too.
That tells me that the compiler is less than fully successful in
optimizing things. If the compiler is having trouble providing constant
propagation we can't necessarily expect it to optimize away the overhead
of the compile time accessor either, but at least it doesn't have the
option of giving up before instantiating the template function.

On a related note, we recently confirmed that the 4.3.0 compiler (on
newer hardware) converts:

int myMax(int a, int b){ return a > b ? a : b;}

into:

.globl _Z5myMaxii
        .type _Z5myMaxii, @function
_Z5myMaxii:
.LFB2:
        .file 1 "t255.cc"
        .loc 1 7 0
.LVL0:
        .loc 1 7 0
        cmpl %edi, %esi
        cmovge %esi, %edi
.LVL1:
        .loc 1 10 0
        movl %edi, %eax
        ret

instead of:

.globl _Z5myMaxii
        .type _Z5myMaxii, @function
_Z5myMaxii:
.LFB2:
        .file 1 "t255.cc"
        .loc 1 7 0
        pushq %rbp
.LCFI0:
        movq %rsp, %rbp
.LCFI1:
        movl %edi, -4(%rbp)
        movl %esi, -8(%rbp)
        .loc 1 9 0
        movl -4(%rbp), %eax
        cmpl -8(%rbp), %eax
        jle .L2
        movl -4(%rbp), %eax
        movl %eax, -12(%rbp)
        jmp .L3
.L2:
        movl -8(%rbp), %eax
        movl %eax, -12(%rbp)
.L3:
        movl -12(%rbp), %eax
        .loc 1 10 0
        leave
        ret

when compiling for old processor or with old compiler. That is about 4X
fewer instructions and NO BRANCH instructions. Note: cmovge is a new
instruction in the Core2 (merom) processors. I have been using the
following:

template <class T>
inline const T& predicated_value(const bool& pred, const T& a, const T&
b) {
const T* input[2] = {&b, &a}; return *(input[pred]);
}

instead of ? syntax because it was 35% faster than the branch based
machine code the compiler generated when executed on the prescott based
hardware at the time. I'll be able to go back to letting the compiler
know best as soon as we cycle out the old hardware and cycle in the new
compiler.

Thanks,
Luke

Next message: Simonson, Lucanus J: "[boost] GTL compile time and concepts redesign"
Previous message: Steven Watanabe: "Re: [boost] GTL compile time vs. run time accessors"
In reply to: Steven Watanabe: "Re: [boost] GTL compile time vs. run time accessors"
Next in thread: Fernando Cacciola: "Re: [boost] GTL compile time vs. run time accessors"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk