Subject: Re: [boost] [context/coroutine] split into two libs in trunk?!
From: Oliver Kowalke (oliver.kowalke_at_[hidden])
Date: 2012-04-14 01:12:06
Am 14.04.2012 00:37, schrieb Giovanni Piero Deretta:
> Not saving the SSE and x87 control word was a conscious decision on my
> part. The control words are unlike other callee/caller saved registers
> as they define a process mode and are explicitly under the control of
> the user. In my tests the instructions used to load/save these states
> had a considerable cost on my old netburst CPU.
> The compiler may temporarily change the control state (for example in
> legacy x87 mode to implement some non-standard rounding), but it has
> to reset them to the original value before calling any externally
> defined function (like the ASM context switching functions) as these
> will expect the control words to be in the default state (whatever
> this is).
> The only time called functions will see the control words in a non
> default state is if the user explicitly changed the state, for example
> via a C99 compiler pragma or builtin function. The boost.coroutine
> documentation did explicitly warn about risky changes to proces state
> across coroutine calls, including the signal mask (which boost.context
> also does not preserve), locks, TLS and of course the FPU state.
but what about code you don't have under your control (legacy libs etc.)?
> Having said that, I doubt that on a modern CPU this extra state
> save/change would hardly cost more than an extra 50% on a context call
> (which in the grand order of things isn't really that much). Any
> claimed scalability differences between boost.context and the my old
> library must come from somewhere else and not from the low lever
> context switching routines. The only thing that comes to my mind is
> that boost.coroutine did save all registers on the stack (which is
> very likely to be cache hot) instead of a separate structure as for
> boost.context (which, IIRC, was heap allocated in the higher lever
I you refer to my performance tests - I never compared boost.context
with boost.coroutine -
I've measured the cycle-costs of fcontext and ucontext.
> FWIW, while it is hard to compare my results on an old 32 bit machine
> with yours on an undoubtely newer CPU and OS, I distinctly remember
> from my tests that a coroutine-to-coroutine switch (using the high
> level API) was about an 100 time faster using the custom backend than
> using ucontext (mainly because of the high cost of the function call).
that was the same what I figured out (see above fcontext vs. ucontext
and performance test app in boost.context).
I assumed that your lib used ucontext as back-end and therefore I've had
concerns about that it would
be much faster than boost.context (as told in another post).
btw, file swapcontext64.cpp (from your lib) might contain a bug
it preserves the registers rbx, rbp, rax , rdx.
I think it should be rbx, rbp, r12-r15 (+SSE2 and x87) as described in'
SysV ABI AMD64 Architecture Processor Supplement - Draft Version 0.99.4'.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk