Boost logo

Boost :

From: Andreas Huber (ahd6974-spamgroupstrap_at_[hidden])
Date: 2005-03-11 08:05:13


> * Introduction. The nine requirements are well-stated, and I accept
> them.

Good. That's a start.

[snip]
> * Speed versus scalability tradeoffs.
>
> Here there is a big problem. I accept that there must be a speed
> versus scalability tradeoff; in fact, I'd be slightly surprised if
> there weren't. However, I would normally expect a slow-down of at
> most a few percentage points when using a framework; I find it
> stunning that the difference in performance between hand-written
> machines and framework-generated machines is so significant
> (10ns/10ns vs. 130ns/220ns), even in a "somewhat unrealistic" test
> case.

See below.

> In order to accept this performance gap I would need a very convincing
> explanation. Instead, the explanation seems to be of this form: "The
> requirements are X; I tried hard to satisfy X without compromising
> performance, but I couldn't; therefore, it can't be done."

Do I really say this? I don't see any place where I say or imply that
something can't be done because *I* couldn't think of a solution.

> The explanation of why the performance difference shouldn't matter is
> also unconvincing. I'm particularly disturbed by the implication that
> FMSs should not be used for parsing. Why not?

See my upcoming answer in the branch of this thread.

> * Double dispatch. This seems like it should be a subsection of
> "Speed versus scalability tradeoffs"

Noted.

> - GOF Visitor: It's natural to ask why a complex state machine can't
> be broken up into several parts which don't know much about each
> other, each of which uses a more efficient form of dispatch. Less
> efficient dispatch would be needed only at
> junctions between the separate parts. Dave Gomboc raised a similar
> point, and I don't think it has been adequately addressed.

I think the answer lies in the way how an FSM can be split up usefully, see
below. Also, dispatch is typically not the main performance problem.

> - Two-dimensional array of function pointers. This explanation makes
> no sense to me, even as supplemented by Andreas's reply to Iain
> Hanson this morning. It's common for a composite component to make
> use of several components implemented in different TUs. For instance,
> the composite commonent may "know about" the other components by
> derivation, by containment, typelists, etc. When it needs to assemble
> the array of pointers, it can simply ask each of them for their
> contribution.

No, in a nutshell it cannot because it doesn't know how many or what
components exist. Of course, there must be links between the components
but C++ does not allow you to follow these links in a useful way and
there is no way to force the components in the TUs to register
themselves. The only way you can guarantee the registration is by
requiring the programmer to register all the parts explicitly, which is
contrary to the scalability requirements. I can only show this with an
example:

What follows is stripped-down StopWatch code, one state (Running) is
implemented in a separate TU. I'm currently too tired to push this
through a compiler so I hope to get this correct enough to make sense:

*** Main.hpp ***

#include <boost/fsm/state_machine.hpp>
#include <boost/fsm/event.hpp>
#include <boost/fsm/simple_state.hpp>

struct Active;
struct StopWatch : fsm::state_machine< StopWatch, Active > {};

struct Stopped;
struct Active : fsm::simple_state<
  Active, StopWatch, fsm::no_reactions, Stopped > {};

*** Main.cpp ***

#include "Main.hpp"

struct Stopped : fsm::simple_state<
  Stopped, Active, fsm::custom_reaction< EvStartStop > >
{
  // Here we only say that we might do something with EvStartStop
  fsm::result react( const EvStartStop & );
};

int main()
{
  StopWatch stopWatch;
  stopWatch.process_event( EvStartStop() );

  return 0;
}

*** Running.cpp ***

#include <boost/fsm/simple_state.hpp>
#include "Main.hpp"

struct Running : fsm::simple_state< Running, Active > {};

fsm::result Stopped::react( const EvStartStop & )
{
  // Here we connect the so far unrelated parts of the FSM together.
  return transit< Running >();
}

Looking at the code above we notice several things:

1. Visibility: In the proposed FSM lib, an outer state "knows" only its
inner initial state (which can be an incomplete type) but none of its
other inner states (Active does not know that Running exists, neither
does StopWatch). An inner state knows all its outer states. Now, that it
is done this way and not the other way round (outer states know all
their inner states but inner states don't know their outer states) is
not a coincidence for the simple reason that the other way wouldn't
leave any room for parts in multiple TUs. I also think this is pretty
much the only sensible way how you can spread an FSM over multiple TUs.
I can't prove this (for me this is just obvious). So, if you have any
objections be sure to attach a proposal on how you would select the
visibility to enable spreading over multiple TUs.
It is worth noting that there are strong links to base classes and
derived classes. A base class corresponds to an outer state, a derived
class corresponds to an inner state, with the same directions of
visibility, order of definition and inheritance of properties.

2. Connections: The proposed FSM lib connects the parts at link time,
through a call to the react function. This works fine but does leave you
with the "problem" that there is no way to have a global view at your
FSM, *not* *even* *at* *runtime*. When the FSM still resides inside
Stopped you don't have *any* way of knowing that Running even exists.
Likewise, when the transition to Running has been made, you cannot
possibly know whether there are other states because Running might
itself define similar custom_reactions that lead you to other states
when the reactions are triggered. Note that the custom reaction cannot
possibly have any knowledge about the target state as you then can no
longer hide Running in a separate TU.
Now, let us consider other possibilities how Running could be connected
to Stopped (I'm neither assuming a particular lib interface nor a
particular lib implementation, the only thing that I assume is that we
have an FSM employing a hypothetical lib with the same states Active,
Stopped, Running spread over the TUs Running.cpp and Main.cpp):
a) At compile time only: I don't see how that could work: Running.cpp
does not have any knowledge about Main.cpp and vice versa.
b) At runtime only: State definition could contain a static instance of
a class, which, in its constructor, registers the state with its outer
state. The outermost states would register with the FSM. This way the
FSM would know all its states and also their place in the state
hierarchy. Problem is: C++ does not guarantee when exactly the ctors of
the static objects are called (see 9.4.2/7, 3.6.2/3). They might be
called only "before the first use of any function or object defined in
the same translation unit as the [static] object". So, registration with
static instances is pretty much useless in our case. It's a hen/egg
problem: The states don't register themselves until we call a function
in the TU they are defined in but we don't know what functions we need
to call until they are registered. I might be reading the standard wrong
but I know of at least one platform that implements it exactly this way
(MSVC7.1). That is enough for me to steer well clear of automatic
runtime registration. There's another problem: Runtime-only registration
pretty much rules out some forms of static checking (invalid
transitions).
c) Any combination of the above: I don't see how that could improve
things in any way

3. State-local storage: Consider what we need to do if we want to give
Running a new auxilary state variable, e.g. the startTime_ mentioned in
the tutorial. Because Running has its own storage, we can simply add a
member to its class and are done. We don't need to change Active or the
StopWatch.
If the proposed FSM lib wouldn't provide storage for each state, it
would force people to change the state_machine each time they need to
introduce such a variable (or at least once when they introduce the
first variable). This is contrary to the scalability requirements.
I can't think of other possibilities to ensure the scalability in this area.
To improve the situation one can surely toy around with not destructing the
objects holding the variables on exit (call entry() and exit() member
functions instead), but I don't think this will buy you much performance-wise
because it would make the bookkeeping inside the FSM harder (somewhere you
still need to track active and inactive states). Also, in FSMs with many
states this would fatten the memory footprint considerably.

Point 1 & 2 limit the maximum possible dispatch performance, more on that
later.
Point 3 limits the maximum possible transition performance, which will
typically be the main performance bottleneck.

> If there are order of initialization problems, the arrays can be
> allocated on first use.

No, that is not an option. Unless the allocation and the filling of the
array always take considerably less time than the reaction itself, this
would prohibit the use of such an FSM lib in real-time applications. In
real-time applications you mostly cannot afford to do such lazy init
stuff, because that would raise the upper limit of the reaction time to
often unacceptable levels.
In other words: In real-time applications you much prefer having always the
same performance characteristics for every event, even if that means that the
average performance is worse than the average performance that could be
achieved by making lazy init optimizations for the first event.

> In my experience, C++ provides extremely fine-grained control over
> which parts of a computation are performed at compile time and which
> at runtime. I have a strong suspicion that a FSM library could be
> written which satisfies the nine requirements and also provides
> ruthless efficiencies, at least in many common cases.

I believe a library satisfying the 9 requirements can at best provide O(n)
dispatch for an FSM using all the features, where n is the number of currently
active orthogonal regions. So, if you don't use orthogonal states you get
constant time dispatch. The reason lies in the fact that, as I hope I have
shown above, you cannot know how your whole FSM looks. Dispatch must always
begin with a currently active innermost state and consider its reactions
first. Only if there is no suitable reaction for the innermost state, the next
outer state can be considered and so on until an outermost state is reached.
Now, since the innermost state knows all its outer states at compile time, it
could collapse all the reactions of the outer states into a single STT,
yielding constant time lookup for each innermost state *separately*. Hence, if
there are n innermost states active at the same time you have O(n) lookup.
In any case I don't expect dispatch to be the main performance bottleneck in a
typical FSM, see above.

> It may turn out
> to be necessary to sacrifice performance for some FSMs, but I believe
> I hybrid approach may be possible which uses fast dispatch in the
> majority of cases. I'm willing to be convinced that it is impossible,
> but I haven't seen a satisfying explanation yet.

A hybrid approach could indeed be possible, but I don't think it can be done
without requiring help from the user (i.e. complicating the interface) or
constraining the functionality. For example you could require the user to
somehow (at CT or RT) register all the possible states of a machine or you
could prohibit custom_reactions (i.e. require the implementation of the whole
FSM in one TU). Both would give you O(1) lookup. But let's not forget that
lookup is typically not the main performance bottleneck! Neither of these
measures would speed up transition execution.
Transition execution can only be sped up by prohibiting state-local storage,
i.e. by further constraining the functionality. "Why not?" you might ask. It
seems that this can only be done if you complicate the interface for everyone,
including the users who need the full functionality. A user requiring state-
local storage would be forced to somehow associate a state idenitifier with a
type, which she can then use as a container to hold the auxillary variables.

I guess here probably lies our main disagreement: For me the users requiring
the full functionality have the priority. Why? Because I think that in a
certain class of applications most non-trivial FSMs will hugely benefit from
state-local storage (at least some users have confirmed that).

To conclude the above:
1. Users confirm that the current interface and features is/are useful for a
certain (IMO sufficiently large) class of applications. I realize that there
is another sufficiently large class of applications which don't need many of
performance-degrading features present in the proposed FSM library.
2. I agree that it is theoretically possible to implement a hybrid library
that supports both classes, but *only* at the cost of a much more complex
interface *and* a much more complex implementation. I would even go as far as
saying that two separate libraries optimized for each class would have very
little in common (interface or implementation). I don't see any benefits in a
hybrid approach other than being able to say: "This is *the* solution for all
your FSM problems". And it could very well turn out that the hybrid interface
is so complex that users would not consider using it.

> - The documentation warns that executing code after calling
> simple_state::transit (and several other functions) can lead to
> undefined behavior. I believe the library should prevent this from
> occurring. One approach would be to have the function specify a
> result which is stored until react has finished executing, and then
> acted upon. Another possibility would be to give result move
> semantics, so result instances can't be stored in local variables and
> returned later.
>
> - The restriction on event deferral -- one must pass an event pointed
> to by an instance of intrusive_ptr -- is extremely inelegant:
>
> int main()
> {
> myMachine.process_event(
> *boost::intrusive_ptr<Event>( new Event() )
> );
> }
>
> I'm sure there is a good reason for this restriction, but there must
> be a cleaner way to accomplish the same thing.

Have you read the rationale for this?

http://tinyurl.com/6xbo2 (Limitations, Deferring and posting events)

> - The procedure for posting events from inside entry actions looks
> messy; this is noted in the docs, and it's not nearly as bad as the
> syntax for event deferral, but I'd like to see it improved.

I don't see how. Suggestions welcome.

> - I don't understand why it is necessary to pass a type as a
> length-one mpl::list simply because it is a class template
> instantiation. The docs say it is "due to some template-related
> implementation details", but this sounds fishy to me.

This is the example in the docs:

struct NotShooting : fsm::simple_state< NotShooting, Camera,
  /* ... */, mpl::list< fsm::deep_history< Idle > >,
  fsm::has_deep_history >
{
  // ...
};

Internally, I use mpl::is_sequence to find out whether we already have a list
of inner initial states. If there's none I need to make such a list. At the
point of definition of NotShooting, Idle is an incomplete type. Now the
problem is that mpl::is_sequence triggers an instantiation of the deep_history
template and I have not found a way how I can prevent access to Idle inside
that template. When you wrap that in a list the deep_history template is not
instantiated and Idle is not accessed. I'm sure there is a solution but the
problem simply never made it to the top of my to-do list.

[snip]
> I ran the regression tests on seven compilers, with the following
> results:
>
> 1. VC7.1 -- passed all tests
> 2. VC8.0 -- passed all tests
> 3. VC7.0 --
> ...failed updating 36 targets..
> ...skipped 144 targets...
> ...updated 506 targets...
> 4. GCC 3.4.1 (Cygwin) -- passed all tests
> 4. GCC 3.2 (MinGW) -- passed all tests
> 5. Intel 8.0
> ...failed updating 4 targets...
> ...skipped 4 targets...
> ...updated 678 targets...
> 6. CW 8.3
> ...failed updating 28 targets...
> ...skipped 112 targets...
> ...updated 546 targets...
> 7. Borland 5.6.4
> ...failed updating 36 targets...
> ...skipped 144 targets...
> ...updated 506 targets...
>
> I'd be happy to send the full error messages.

Please do, thanks for the testing!

Regards,

-- 
Andreas Huber
When replying by private email, please remove the words spam and trap
from the address shown in the header.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk