Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2005-03-12 01:07:38


Andreas Huber wrote:
>> * Introduction. The nine requirements are well-stated, and I accept
>> them.
>
> Good. That's a start.
>
> [snip]
>> * Speed versus scalability tradeoffs.

>> In order to accept this performance gap I would need a very
>> convincing explanation. Instead, the explanation seems to be of this
>> form: "The requirements are X; I tried hard to satisfy X without
>> compromising performance, but I couldn't; therefore, it can't be
>> done."
>
> Do I really say this? I don't see any place where I say or imply that
> something can't be done because *I* couldn't think of a solution.

Of course not. Otherwise I wouldn't have had to make up the obviously fake
quote. Consider: "Quite a bit of effort has gone into making the library fast
for small simple machines and scaleable at the same time ... the scalability
does not come for free." I naturally assumed it was your effort you were talking
about.

I'm just arguing that the Rationale doesn't adequately explain why better
performance is not possible. I'm willing to be convinced that it isn't possible,
and that you've made the best possible effort to find more efficient
alternatives.

>> - Two-dimensional array of function pointers. This explanation makes
>> no sense to me, even as supplemented by Andreas's reply to Iain
>> Hanson this morning. It's common for a composite component to make
>> use of several components implemented in different TUs. For instance,
>> the composite commonent may "know about" the other components by
>> derivation, by containment, typelists, etc. When it needs to assemble
>> the array of pointers, it can simply ask each of them for their
>> contribution.
>
> No, in a nutshell it cannot because it doesn't know how many or what
> components exist.

I'm not talking about the proposed library here. I accept that it's not possible
in your implementation.

> Of course, there must be links between the
> components
> but C++ does not allow you to follow these links in a useful way and
> there is no way to force the components in the TUs to register
> themselves. The only way you can guarantee the registration is by
> requiring the programmer to register all the parts explicitly, which
> is contrary to the scalability requirements.

You frequently mention "manually registering" components. It would be helpful if
you would explain what you mean by this. For example, a speciailization of
std::pair has two components, first_type and second_type. When I write

      typedef std::pair<int, float> pair_type;

does this count as "manually registering" the components of pair_type, since I
have to know what they are and write them down? Lots of C++ libraries allow
users to construct composite components out of components defined in different
TUs. I'm not trying to be stubborn; I really don't understand why FSM is special
in this regard.

> I can only show this
> with an example:

<snip detailed example>

> Looking at the code above we notice several things:
>
> 1. Visibility: In the proposed FSM lib, an outer state "knows" only
> its inner initial state (which can be an incomplete type) but none of its
> other inner states (Active does not know that Running exists, neither
> does StopWatch). An inner state knows all its outer states. Now, that
> it is done this way and not the other way round (outer states know all
> their inner states but inner states don't know their outer states) is
> not a coincidence for the simple reason that the other way wouldn't
> leave any room for parts in multiple TUs. I also think this is pretty
> much the only sensible way how you can spread an FSM over multiple
> TUs. I can't prove this (for me this is just obvious).

Here you've come quite close to saying: "The requirements are X; I tried hard to
satisfy X ...but I couldn't; therefore, it can't be done."

> So, if you have any
> objections be sure to attach a proposal on how you would select the
> visibility to enable spreading over multiple TUs.

Please -- I acknowledged at the beginning of my review that I hadn't had time to
thoroughly examine the implementation and suggest improvements. But I do have a
little experience in C++, and I wanted to point out what I saw as gaps in your
rationale.

> 2. Connections: The proposed FSM lib connects the parts at link time,
> through a call to the react function. This works fine but does leave
> you
> with the "problem" that there is no way to have a global view at your
> FSM, *not* *even* *at* *runtime*. When the FSM still resides inside
> Stopped you don't have *any* way of knowing that Running even exists.
> Likewise, when the transition to Running has been made, you cannot
> possibly know whether there are other states because Running might
> itself define similar custom_reactions that lead you to other states
> when the reactions are triggered. Note that the custom reaction cannot
> possibly have any knowledge about the target state as you then can no
> longer hide Running in a separate TU.

Here you're just describing your particular implementation.

> Now, let us consider other possibilities how Running could be
> connected
> to Stopped (I'm neither assuming a particular lib interface nor a
> particular lib implementation, the only thing that I assume is that we
> have an FSM employing a hypothetical lib with the same states Active,
> Stopped, Running spread over the TUs Running.cpp and Main.cpp):

Good, now you're going to consider alternatives.

> a) At compile time only: I don't see how that could work: Running.cpp
> does not have any knowledge about Main.cpp and vice versa.

As far as I'm concerned, you've just dismissed this out of hand because you
can't see how to do it.

Possibly the joints between parts of machines defined in separate TUs must be
weaker (i.e., less known at compile time) than the connections between
collections of states implemented in a single TU.

> b) At runtime only: State definition could contain a static instance
> of a class, which, in its constructor, registers the state with its outer
> state.

This is well-known not to work, for the reasons you state.

> c) Any combination of the above: I don't see how that could improve
> things in any way

If I was working on the library, I'd try a combination of your current technique
and a)

> 3. State-local storage: Consider what we need to do if we want to give
> Running a new auxilary state variable, e.g. the startTime_ mentioned
> in the tutorial. Because Running has its own storage, we can simply add a
> member to its class and are done. We don't need to change Active or
> the StopWatch.
> If the proposed FSM lib wouldn't provide storage for each state, it
> would force people to change the state_machine each time they need to
> introduce such a variable (or at least once when they introduce the
> first variable).

I never said you should omit state-local storage.

> Point 1 & 2 limit the maximum possible dispatch performance, more on
> that later.
> Point 3 limits the maximum possible transition performance, which will
> typically be the main performance bottleneck.

>> If there are order of initialization problems, the arrays can be
>> allocated on first use.

> No, that is not an option. Unless the allocation and the filling of
> the array always take considerably less time than the reaction itself,
> this would prohibit the use of such an FSM lib in real-time applications.
> In real-time applications you mostly cannot afford to do such lazy
> init stuff, because that would raise the upper limit of the reaction time
> to often unacceptable levels.

You snipped the context.

Jonathan Turkanis wrote:
> When it needs to assemble
> the array of pointers, it can simply ask each of them for their
> contribution. If there are order of initialization problems, the arrays can
> be allocated on first use.

The first use would be when the machine is constructing the table of pointers,
which would presumably be when it is constructed or immediately after it is
started.

>> In my experience, C++ provides extremely fine-grained control over
>> which parts of a computation are performed at compile time and which
>> at runtime. I have a strong suspicion that a FSM library could be
>> written which satisfies the nine requirements and also provides
>> ruthless efficiencies, at least in many common cases.
>
> I believe a library satisfying the 9 requirements can at best provide
> O(n) dispatch for an FSM using all the features, where n is the
> number of currently active orthogonal regions. So, if you don't use
> orthogonal states you get constant time dispatch. The reason lies in
> the fact that, as I hope I have shown above,

Sorry, but no.

> you cannot know how your
> whole FSM looks. Dispatch must always begin with a currently active
> innermost state and consider its reactions first. Only if there is no
> suitable reaction for the innermost state, the next outer state can
> be considered and so on until an outermost state is reached. Now,
> since the innermost state knows all its outer states at compile time,
> it could collapse all the reactions of the outer states into a single
> STT, yielding constant time lookup for each innermost state
> *separately*. Hence, if there are n innermost states active at the
> same time you have O(n) lookup. In any case I don't expect dispatch
> to be the main performance bottleneck in a typical FSM, see above.

>> It may turn out
>> to be necessary to sacrifice performance for some FSMs, but I believe
>> I hybrid approach may be possible which uses fast dispatch in the
>> majority of cases. I'm willing to be convinced that it is impossible,
>> but I haven't seen a satisfying explanation yet.

> A hybrid approach could indeed be possible, but I don't think it can
> be done without requiring help from the user (i.e. complicating the
> interface) or constraining the functionality.

Have you tried? My instinct says it can be done.

> For example you could
> require the user to somehow (at CT or RT) register all the possible
> states of a machine or you could prohibit custom_reactions (i.e.
> require the implementation of the whole FSM in one TU).

Again, I don't know what you mean by "registering all possible states". If it
just means listing them in a typelist, for instance, I don't see why this can't
be done separately for each TU that contains part of the state machine
implementation. The definition of the whole machine doesn't necessarily have to
enumerate all the states.

And I'm not suggesting you prohibit custom reactions.

> Both would
> give you O(1) lookup. But let's not forget that lookup is typically
> not the main performance bottleneck! Neither of these measures would
> speed up transition execution.

As far I can see, you have not explained why state local storage is expensive,
even in your implementation. Is it dynamic allocation? Why not construct small
states into a properly aligned buffer?

> Transition execution can only be sped up by prohibiting state-local
> storage, i.e. by further constraining the functionality.
> "Why not?"
> you might ask.

Of course.

> It seems that this can only be done if you complicate
> the interface for everyone, including the users who need the full
> functionality. A user requiring state- local storage would be forced
> to somehow associate a state idenitifier with a type, which she can
> then use as a container to hold the auxillary variables.

I can't see why this would be necessary. Honestly, I really don't understand
what you're talking about here. As I said at the beginning I accept the
requirements, including state local storage.

> I guess here probably lies our main disagreement: For me the users
> requiring the full functionality have the priority. Why? Because I
> think that in a certain class of applications most non-trivial FSMs
> will hugely benefit from state-local storage (at least some users
> have confirmed that).

Nothing I said in my review implied that you should sacrifice features for
performance; all I want is a detailed explanation why this would be necessary.

Others have suggested that the library might be accepted but renamed to indicate
that it did not pretend to be the definitive C++ implementation of FSMs. For
instance, the proposed library would be "Boost.UMLStatecharts," while a separate
library might cover faster state machines with fewer features.

I'm against this unless it can be shown that the better perfomance can only be
obtained by sacrificing functionality. I want to emphase again that I phrased my
review as an objection to the Rationale section of the documentation.

> To conclude the above:

> 2. I agree that it is theoretically possible to implement a hybrid
> library that supports both classes, but *only* at the cost of a much
> more complex interface *and* a much more complex implementation. I
> would even go as far as saying that two separate libraries optimized
> for each class would have very little in common (interface or
> implementation). I don't see any benefits in a hybrid approach other
> than being able to say: "This is *the* solution for all your FSM
> problems". And it could very well turn out that the hybrid interface
> is so complex that users would not consider using it.

My guess is that a smooth interface might be possible.

>> - The restriction on event deferral -- one must pass an event pointed
>> to by an instance of intrusive_ptr -- is extremely inelegant:
>>
>> int main()
>> {
>> myMachine.process_event(
>> *boost::intrusive_ptr<Event>( new Event() )
>> );
>> }
>>
>> I'm sure there is a good reason for this restriction, but there must
>> be a cleaner way to accomplish the same thing.
>
> Have you read the rationale for this?
>
> http://tinyurl.com/6xbo2 (Limitations, Deferring and posting events)

You don't explain why a pointer to a dynamically allocated event can't be
passed. Even if there is a reason this can't be done, there's got to be a better
way than having users write

       *boost::intrusive_ptr<Event>( new Event() )

Was this just a toy example? What would a real-word use look like?

>> - The procedure for posting events from inside entry actions looks
>> messy; this is noted in the docs, and it's not nearly as bad as the
>> syntax for event deferral, but I'd like to see it improved.
>
> I don't see how. Suggestions welcome.

Okay.

>> - I don't understand why it is necessary to pass a type as a
>> length-one mpl::list simply because it is a class template
>> instantiation. The docs say it is "due to some template-related
>> implementation details", but this sounds fishy to me.
>
> This is the example in the docs:
>
> struct NotShooting : fsm::simple_state< NotShooting, Camera,
> /* ... */, mpl::list< fsm::deep_history< Idle > >,
> fsm::has_deep_history >
> {
> // ...
> };
>
> Internally, I use mpl::is_sequence to find out whether we already
> have a list of inner initial states. If there's none I need to make
> such a list. At the point of definition of NotShooting, Idle is an
> incomplete type. Now the problem is that mpl::is_sequence triggers an
> instantiation of the deep_history template and I have not found a way
> how I can prevent access to Idle inside that template. When you wrap
> that in a list the deep_history template is not instantiated and Idle
> is not accessed. I'm sure there is a solution but the problem simply
> never made it to the top of my to-do list.

Perhaps you could automatically wrap the template argument at that position in a
sequence-wrapper which turns most types into length-one sequences but is
specialized for mpl::list to pass through the intrinsic operations.

>> I ran the regression tests on seven compilers, with the following
>> results:
>>
>> 1. VC7.1 -- passed all tests
>> 2. VC8.0 -- passed all tests
>> 3. VC7.0 --
>> ...failed updating 36 targets..
>> ...skipped 144 targets...
>> ...updated 506 targets...
>> 4. GCC 3.4.1 (Cygwin) -- passed all tests
>> 4. GCC 3.2 (MinGW) -- passed all tests
>> 5. Intel 8.0
>> ...failed updating 4 targets...
>> ...skipped 4 targets...
>> ...updated 678 targets...
>> 6. CW 8.3
>> ...failed updating 28 targets...
>> ...skipped 112 targets...
>> ...updated 546 targets...
>> 7. Borland 5.6.4
>> ...failed updating 36 targets...
>> ...skipped 144 targets...
>> ...updated 506 targets...
>>
>> I'd be happy to send the full error messages.
>
> Please do, thanks for the testing!

Okay, I will.

Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk