Boost logo

Boost :

Subject: [boost] [context review]
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2011-03-24 23:32:22


First of all, I think the library as some issues and it should not be
accepted as it is, but see below.

- What is your evaluation of the design?

The interface is pretty simple and it seems fine on a first
evaluation, but I think there are some issues.

The first issue is that the destructor of the context class will (at
least by default) happily destroy the stack. This unfortunately will
leak any resource owned by any object on the stack; This is not unlike
destroying an allocator before having destroyed all allocated object.

A stack can only be safely deallocated if either has been completely
unwound or if it doesn't own any resources (note that is not enough to
just destroy any stack allocated object). Context should either: a)
provide an interface to unwind the stack (by automatically trowing a
special exception for example), or b) detect whether the stack has not
been unwound and assert in debug mode in the destructor.

As it is, is not clear the level of abstraction this class wants to
be. It looks like is trying, using RAII, to manage context creation
and destruction safely, giving a false sense of security.

Context is neither copyable nor movable; I can understand the reason
for making it non-copyable, but I see no reason why it shouldn't be
movable. Without either capability it is hard to make context
interoperate with existing code without forcing to dynamic allocation.

Context is templated on both the context implementation and the stack.
 The first problem is that the concept requirements for the Stack
template arguments are not stated. Second, the stack type should not
be part of the context type: the primary reason for specifying
different stacks is for option of having the guard page: enabling this
should not change the type of Context and it should possible to
specify it at runtime. Basically I'm advocating for type erasure; note
that it can (and should) be implemented without additional memory
allocation. The trampoline function can be defined in such a way that
the custom deleter will be invoked after the stack is unwound.
Regarding the CtxT type, instead of this template parameter there
should be a program wide default controllable at compile time by a
macro; I saw elsethread that Oliver is already leaning on this idea.

As it is the context class is IMHO flawed and I'm against its
inclusion in boost right. I'll be very happy to change my vote if I
see a better implementation. A good abstraction layer around the low
level context, among other things, should support clean unwind of the
stack (or detecting an not-unwound stack), generic functors (and not
just function pointers) for type safety and custom deleters.

On the other hand the detail fcontext class seems a good, if maybe not
complete, portable implementation of a low level ucontext-like API. I
would be in favor of a simpler version of context which consists
solely on the fcontext class (with a guarantee that this type is a
POD), plus the make_context and swap_context functions. A user will
have lower expectations from such a low level API and at the same time
will give more freedom for other people to implement on top of it. A
safe context class would just be one of the client of the API, other
library authors could either use Context of the underlying fcontex
directly depending on wether they need the speed or safety.

I think this basic API could still be tweaked [but my vote won't be
conditional on that] to add extra functionality:

 * Instead of supporting a 'fc_link' a-la ucontext, have the callback
function return a new context (it is void now). 'fc_link' is useful
only on some limited cases (usually when you have an external
scheduler managing your contextes), while the having the callback
return the 'next continuation' has more applications (note that
fc_link can be implemented easily in term of the continuation
returning variant, but the reverse is harder). This is very useful,
for example, to implement some kind of 'exit_to' functionality (unwind
the stack and restore another context), a key piece for safe context
destruction.

 * Give to fcontext_swap the ability to exchange a parameter with the
other context:

   fcontext_parm_t swap_fcontext( fcontext_t * ofc, fcontext_t const*
nfc, fcontext_parm_t parm)

 where fcontext_parm_t is either

    union fcontext_parm_t {
      void * ptr;
      int int_;
    };

 or simply intptr_t;

 Swap_fcontext saves the current context in ocf and restores the
context ncf as usual; additionally, if ofc was 'captured' on a call to
swap_fcontext, this call will return the value of parm on the original
context. This extra parameter is equivalent to the second parameter of
longjmp.

 If ofc was created with make_fcontext, 'parm' will be an extra
parameter to the function.

 Note that because you can't mix and match parameterful and
parameterless swap_fcontext, you should only have the former as it can
be potentially implemented with little or no overhead.

 * Add a variant of swap_context that allows executing a function on
the destination context (in this case 'parm' would be passed parameter
to this function. This is mostly useful for injecting an exception on
the destination context. This can be potentially implemented on top of
the existing functionality, but would slow down every context switch;
a possible efficient implementation would manipulate the destination
stack to add a call to the function on top of it, so that the cost is
paid only when required.

All the previous improvements are implementable unobtrusively on top
of the exiting API, but such implementations will be much less
efficient. I'm suggesting that they should be part of the basic API
becausecan be implemented with zero or near zero overhead when not
used.

- What is your evaluation of the implementation?

The implementation seems reasonably efficient, but I have seen
elsethread suggestions to make the constructor more expensive (read
multiple memory allocations) in exchange for flexibility; this is IMHO
wrong, the constructor should not be assumed to be a non critical
operation. In fact I think that most of the proposals can be
efficiently implemented without additional memory allocation by
reusing the already available stack (which can be reused).

Most of the 'throw' statements in the context class should actually be asserts.

Make_fcontext requires a call to get_fcontex first, a-la ucontext:
there is no reason for this, just merge get_context with make_context
(and drop the unused set_fcontext).

The posix ucontext wapper and the windows fiber wrapper should be made
API compatible with fcontext.

Regarding the assembler implementations of fcontext, I think they
could be simplified if the fcontext itself was represented as a single
pointer to the top of the stack. This would save some instructions to
move the source address from the stack return address to the fcontext
struct and back. The i386 implementation could use register calling
convention to simplify the implementation and speed it up.

Both the i386 and amd64 asm implementation use the 'ret' instruction
to restore instruction pointer of the destination context. This will
always be mispredicted (costing 20 cycles or more, easily more than
the rest of the swapping code). Just jmping to the the destination
will save a push and will give the CPU a chance to predict the jmp
(modern CPUs have some dynamic prediction capability for indirect
jumps).

Bot the i386 and amd64 asm implementation save and restore the fpu
control word (the amd64 both x87 and SSE2 control words). Last time I
timed it, this was a relatively an expensive operation, not sure about
current architectures; I do not think this should be saved and
restored: the compiler will treat it as a caller clobbered variable
and will take care of it in case of implicit manipulations. The user
should be prohibited from changing it across a swap_fcontext call. As
a minimum, there should be a compile time option to disable the
saving.

Note that many posix implementation of the ucontext API need to
restore it (along with the full register set) because on traditional
unices a context could be captured asynchronously by a signal, but it
is not the case with boost.context.

BTW, the current code is already inconsistent because it is saving the
SSE2 control world on amd64 but not on i386.

For posix systems lacking ucontext, fcontext can be implemented using
the sigaltstack trick (this can actually be faster than ucontext if
the non signal restoring _longjmp is used). For reference see:
http://www.gnu.org/software/pth/rse-pmt.ps

Finally, an implementation on top of boost threads would be nice. It
will be very slow, but useful for portability and when performance is
not an issue; also won't probably be much slower than ucontext.

- What is your evaluation of the documentation?

The documentation is very minimalist and should be expanded. It
assumes the user is already familiar with the context swapping
concept. The posix ucontext APIs man page are a good start. The
fcontext class should be documented.

The first time I read the documentation I understood that the
constructor captured the current context a-la setjmp (a very bad idea,
IMHO). I had to read the implementation to convince myself it was not
the case.

- What is your evaluation of the potential usefulness of the library?

This is definitely not going to be one of the most used boost
libraries, but it is a required piece to implement other libraries
like coroutines, user lever threading, tasks, etc.

- Did you try to use the library? With what compiler? Did you have any
 problems?

I didn't try to use the library.

- How much effort did you put into your evaluation? A glance? A quick
  reading? In-depth study?

I did an in-depth study of the implementation.

- Are you knowledgeable about the problem domain?

I consider myself fairly experienced in this domain: As part of the
GSoC project Boost.Coroutine I did some pretty in depth research on
the topic and I implemented a very similar functionality;

BTW, I see that Context borrows many ideas from Boost.Coroutine
implementation (some of the tricks in the fiber wrapper look quite
familiar); a small acknowledgement would be appreciated :).

-- 
gpd

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk