Boost logo

Boost :

From: Tobias Schwinger (tschwinger_at_[hidden])
Date: 2005-07-08 06:29:48


Paul Mensonides wrote:
>>-----Original Message-----
>>From: boost-bounces_at_[hidden]
>>[mailto:boost-bounces_at_[hidden]] On Behalf Of Tobias Schwinger
>
>
>>>>Apologies for the OT, but why "verbose"? __stdcall isn't verbose at
>>>>all.
>>>
>>>
>>>Sorry, underhanded (yet well-deserved) shot at Pascal.
>>
>> ^ ;-) ^
>>
>>Btw. "__pascal" and "__stdcall" are not exactly the same:
>>Both require the callee to clean up but the arguments are
>>ordered differently on the stack.
>
>
> Hmm, didn't know there was a difference. In any case, the callee cleanup is the
> part that makes it inferior to __cdecl. It isn't as general, both for variadics

Variadics would require some sort of "dynamic cleanup" if done at the callee-site
(I doubt there is any compiler out there implementing ugly and crazy stuff like
this)...

> and possible tail-recursion optimizations--particularly in mutually
> tail-recursive functions:
>
> int g(int x);
>
> int f(int x) {
> return g(x);
> }
>
> int g(int x) {
> return f(x);
> }
>
> (Yes, I know there is no termination.) The point is that the compiler could
> push 'x' onto the stack from external code, but 'f' and 'g' could call each
> other repeatedly without touching the stack at all, and then when the call
> actually does return to the external code, the external code can pop 'x' from
> the stack.

This kind of "stack frame recycling" is attractive, indeed.

You may still be able to request it more explicitly (without relying on
optimization at all) by using an equivalent, iterative algorithm ;-).

> This can work even if 'f' and 'g' are separately compiled (i.e. not
> optimized as a unit).

Say we split it into two translation units. The compiler sees this code

   int g(int x);
   int f(int x) { return g(x); }

and cannot know what g does, so the full code generation for the call must happen
in the linker. How does the linker then know the callee doesn't change 'x' and
it's legal to reuse the stack frame here? Does the other object file contain this
kind of information?

> This doesn't work under 'callee cleanup' without extra
> scaffolding because the callee cannot know if 'x' should be popped.

Well, the callee always cleans up. So we'ld have to "unpop" the values at the
call-site to reuse the stack frame and add unecessary code. This code, however,
can be theoretically eliminated in the CPU at runtime:

   add esp,4
   sub esp,2

can be, given there are no instructions in between that use the stack pointer
register (these side-effects are tracked anyway for pipelining), transformed to

   add esp,2

Further there are numerous situation where stack reuse isn't applicable and caller
cleanup involves more code at the call-site, so I believe __stdcall has its place.

The only quite useless calling convention I see, talking x86, is __fastcall, which
attempts to use CPU registers for the argument values. Basically it might have
been a good idea but this CPU has too few and not even all-purpose registers for
__fastcall to make much sense (except for very tiny functions, perhaps).

Thanks,

Tobias


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk