|
Boost : |
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2007-11-13 06:38:52
On Nov 13, 2007 3:19 AM, Jeff Garland <jeff_at_[hidden]> wrote:
> Robert Kawulak wrote:
> >> From: Jeff Garland
>
> > I'm not sure I understand - why you think there's a function call and 2
> > comparisons in that example?
> >
> > The function's code simply returns the literal
> > value of 5 and that's it.
> >
> > Anyway, I've tried with more representative example - the value is not known at
> > compile time:
> >
> > extern int i;
> >
> > void test()
> > {
> > bounded_int<int, 3, 5>::type b(i);
> > }
> >
> > With #define NDEBUG and -O2 switch, gcc 4.0.1 gave the following code:
> >
> > __Z4testv:
> > [...] irrelevant stack-manipulation code
>
> That 'irrelevant stack manipulation code' isn't irrelevant at all...it's the
> function call overhead cost and would likely *not exist* in a fully static
> version. Same with the stuff right before the return.
>
Actually the stack manipulation is for the test function, not for the
bound checking function, so, as Robert said, it is completely
irrelevant in this context. In fact the __Z4testv: is just that, the
mangled name of the test function.
> <snip>...good details...
>
> > In both cases the compilers were able to optimise-away the abstraction of bounds
> > generators, using the literal values of bounds in the comparisons. I'm not sure
> > this is what you were worrying about, but if so, then I think there's no problem
> > at all. ;-)
>
> Nope...it's the stack code which can't be optimized out because you have to
> have a function pointer and call a different location instead of fully
> inlining. If you take out the dynamic function part I bet you'll cut the
> generated code down to just the compares and the error handling...
I was surprised for the lack of call to bound functions, because in my
experience gcc is completely incapable to inline function pointers
(except when used as non type template parameters), so I went and
looked at the code.
>From a cursory look I can se no use of function pointers, nor I would
expect it: the bounds are knonw at compile time so there is no need of
dynamic dispatch. The framework seems to allow dynamic (runtime)
bounds, but even in that case, i see no need for dynamic dispatch.
BTW, when compiling with gcc 4.1.2 under x86_64:
.globl my_test_function()
.type my_test_function(), @function
my_test_function():
.LFB1638:
[ stack frame setup for test function ]
movq %rbx, -32(%rsp)
.LCFI6:
movq %rbp, -24(%rsp)
.LCFI7:
movq %r12, -16(%rsp)
.LCFI8:
movq %r13, -8(%rsp)
.LCFI9:
subq $72, %rsp
.LCFI10:
[ load i and bound check it ]
movl i(%rip), %eax
cmpl $2, %eax
jle .L44
cmpl $5, %eax
jle .L71
.L44: [ error detection ... snipped ]
.L71: [ stack frame tear down ]
movq 40(%rsp), %rbx
movq 48(%rsp), %rbp
movq 56(%rsp), %r12
movq 64(%rsp), %r13
addq $72, %rsp
ret
This means that the bound checking code is just 4 instructions. I
think you can hardly do better than that. This is even better than the
disassembly reported by Robert because it skips bound loads in %eax
and directly encodes immediates as an operand of the cmp instruction.
This saves two instruction, but I'm not sure if in practice it makes a
difference.
BTW, to be 100% honest, the stack frames setup up and tear down are
not completely unrelated to the usage of the bounded integer. Even if
the fast path only partically uses one register (rax, which is callee
clobbered and doesn't need saving), four more registers are saved
even if they are only used in the slow error checking path. If the
compiler could figure that the error checking path is not entered
often, it could move the save and restore in that path. Anyways, in a
real life function you would need to use many registers anyway, so
this doesn't really matter.
And yes, I'm interested in a bounded integers library.
HTH,
-- gpd
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk