Boost logo

Boost :

From: Andreas Huber (ahd6974-spamgroupstrap_at_[hidden])
Date: 2005-03-20 10:51:52


Jonathan Turkanis wrote:
> Andreas Huber wrote:
>> Jonathan Turkanis wrote:
>
>>> 2. If I object to part of your library I should offer a concrete
>>> proposal to fix it
>>
>> Either that or prototypes of techniques of which I said I can't see
>> how they will work. I said that because I have invested considerable
>> time in discussing things that might come to mind when one wants to
>> improve performance. I can't think of much else I can say in support
>> of the current design performance-wise. So I thought it would be more
>> productive if you or anyone else could make concrete proposals. Such
>> a proposal would either send me back to the drawing board or I would
>> need to argue convincingly why it doesn't work.
>
> I've read the documentation and most if not all of the review
> discussion, and still I do not find your explanations satisfying.

It helps if you explain exactly what you don't find satisfying, see
below.

> It's very tempting to try to design an FSM library with better
> performance characteristics, but I don't have anywhere near enough
> time, esp. because I'd have to start out by becoming an FSM expert. I
> don't think I should have to do this before writing a review.

Certainly not. However, I do expect that people questioning the
performance (and along with it the implementation) are able to ask
concrete questions why some things are the way they are. You partially
did that in your previous posts but only partially. I think it is
difficult to respond to statements like:

<quote>
Possibly the joints between parts of machines defined in separate TUs
must be weaker (i.e., less known at compile time) than the connections
between collections of states implemented in a single TU.
</quote>

Suggestions like this aren't of much help as long as you don't get more
concrete. In the worst case you get an unconvincing "I don't see how",
in the best case I can respond with a list of ideas I tried and lengthy
explanations why those ideas don't work. While the latter might be more
convincing, it is rather inefficient because I likely end up explaining
things that you already know or have figured out yourself.
As I said before, I think it is close to impossible to prove that a
certain design is the best performance-wise. Also, I don't claim that
there isn't any room in the current implementation for improvements,
rather I think it has reached a stage where substantial (>= 2x faster)
optimizations are non-trivial.
I can only describe why common optimizations don't work or why they
would violate one ore more of the requirements. Things that I
*consciously* avoided or unsuccessfully tried are described in the
rationale. Moreover, during any programmer career a bag of knowledge is
acquired that is unconsciously applied to every programming job.
Naturally, this bag of knowledge is not the same for everyone which is
probably the main reason why the rationale seems to be sufficient for
some people while others find it unsatisfying. Ergo, the only thing I
can do is listen to the questions people ask and either add explanations
to the rationale or change the implementation.

>>> 3. If I want to understand where the performance bottlenecks are, I
>>> should examine the code.
>>
>> No, I didn't mean that in the generality your summary suggests. In
>> *one* *specific* case I thought it is better if *you* (Jonathan) look
>> at the code before I explain all the details in text. I did that
>> because I *assumed* (maybe wrongly) that you want a very detailed
>> description for which I really don't have the time at the moment.
>
> But this one specific case -- the expense of implementing state local
> storage -- forms the basis for your argument that dispatch time is
> not so important, right?

Right. Here's a description of the relevant parts of the implementation:
a) The currently active states form a tree where the number of branches
is equal to the number of currently active orthogonal regions. In flat
FSMs this tree consists of only the root object. In hierarchical but
non-orthogonal FSMs the tree consists of exactly one branch, with the
depth being equal to the nesting depth
b) The state_machine subtype object maintains pointers to the currently
active outermost state and all currently active innermost states
c) Each object in the tree maintains pointers to its outer context and
all its currently active direct inner states. Innermost states contain a
std::list iterator that points to the position where the state is stored
in the state_machine subtypes' innermost states list

As a result of this structure, the following needs to be done during
transitions:
1) State entry: For every state that is entered an object is
constructed. During construction a vtable pointer is assigned. The
(intrusive) reference count is set to 1. If BOOST_FSM_USE_NATIVE_RTTI is
not defined another pointer member is assigned. The state is registered
with its outer context. An internal pointer to the outer context is
assigned, as a result the reference count of the possibly present outer
state is incremented.
2) State exit: For every state that is exited an object needs to be
destructed. Destruction involves deregistering the state with its outer
state and decrementing the reference count of the outer state.

In a FSM without state-local storage all in 1 & 2 could be avoided.
Instead, a transition is simply a series of function calls to entry/exit
functions. Functions with none or a small implementation can easily
be inlined.

Regards,

-- 
Andreas Huber
When replying by private email, please remove the words spam and trap
from the address shown in the header.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk