
Boost : 
From: Paul Mensonides (pmenso57_at_[hidden])
Date: 20060129 10:00:55
> Original Message
> From: boostbounces_at_[hidden]
> [mailto:boostbounces_at_[hidden]] On Behalf Of Tobias Schwinger
> >>Note:
> >> BOOST_PP_IF does not get disabled
> >
> > IF getting disabled doesn't matter here. The entire ENUM_PARAMS
> > invocation occurs on entry to IFwhere IF is not get disabled.
>
> Right, otherwise "BOOST_PP_CAT(f,BOOST_PP_CAT(o,o))" wouldn't work...
>
> #define BOOST_PP_LOCAL_MACRO(i) I must not post rubbish
> when in a hurry.
> // ... ;)
>
>
> Thanks for correcting,
Not rubbish. Your solution was the correct onemove the invocation of
ENUM_PARAMS out of the input to IF. I was merely elucidating exactly why it
wasn't working. It is a perfect example of pathological input messing with the
internals of a macro.
In preprocessor metaprogramming, one must always remember that commas and
parentheses are functional operators and that the syntax is dynamic rather than
static. The latter is what makes it difficult for editors to do formatting or
syntactic analysis without also preprocessing, but it is also one of the most
powerful features of macro expansion. Not only does the preprocessor write the
syntax of underlying language before parsing, it also writes the syntax of macro
invocations before they are parsed as invocations. This differs from macro
systems in other languages (e.g. Lisp and Scheme) because it can be done
partially. E.g.
#define LPAREN (
#define A(x) B x )
#define B() 123
A(LPAREN) // 123
That is a very powerful ability in certain situations, but it is always
something that you must keep in mind because there are many situations where you
don't want that effect. One way to look at the whole thing is to recognize that
macro expansion is a codewriter. Any given invocation effectively writes new
code into the location of the invocation, and that newly written code is written
prior to any "evaluation" of that code.

Apologies for yet another (extended) reference to Chaos, but Chaos uses this
behavior for significant effect. For example, without doing anything really
fancy, the library can provide about 512 algorithmic steps. Those steps are
generic and are shared by all "normal" algorithms. In any case, those 512
algorithmic steps imply 512 entry points into an algorithm. BOOST_PP_REPEAT,
for example, has three entry pointseach capable of doing around 256
repetitions. CHAOS_PP_REPEAT, OTOH, has around 512 entry points (meaning that
you could do about 512 nested sets of repetitions)each capable of doing a
number of repetitions that is relative to the entry point used. I.e. if the
algorithm is entered at the first entrypoint (ultimately that means that it
isn't invoked by another algorithm) it is capable of around 512 repetitions. If
the macro invoked by CHAOS_PP_REPEAT invokes CHAOS_PP_REPEAT again, that
repetition can generate around 511 repetitions, and so on. Thus, the total
number of repetitions possible is related to the factorial of 512 (it isn't
exactly that). To achieve this, the CHAOS_PP_REPEAT algorithm uses the
algorithmic steps to generate trampolined (notyetinvoked) invocations of the
macro passed to it. After it has generated all of them, it causes all of them
to expand at one time (instead of expanding them as it goes along). Thus, the
algorithm "recurses" down available algorithmic steps to generate the
repetitions, but trampolines all of those repeated invocations back up to the
algorithmic step one greater than the entrypoint into CHAOS_PP_REPEAT. E.g.
#define M(s, n, _) s
CHAOS_PP_EXPR(CHAOS_PP_REPEAT(100, M, ~))
Each invocation of M expands to the current entrypoint (equivalent to the
current algorithmic step). In this example, this will result in 100 straight
2's. So, even though a simple implementation of REPEAT requires 100 steps to
generate 100 repetitions, the net loss of algorithmic steps in the transfer from
REPEAT to M is only 1.

Of course, _with_ doing anything really fancy, those 512 algorithmic steps can
be used in ways that drastically increase the actual number of algorithmic
steps. For example, if an algorithm starts at algorithmic step 1, it can do 10
steps, and then trampoline *itself* back to step 2, do 10 more steps, and
trampoline itself back to step 3, and so on. This gives is a multiplicative
effect on the total number of steps that the algorithm can perform. Similarly,
an algorithm can just go straight down until it reaches the last algorithmic
step available and trampoline itself all the way back to the step that it
started on (i.e. the exact one that it started on, not one greater). If it does
this, it requires another EXPR(...) around the invocation to cause the algorithm
to resume. That extra EXPR doubles the number of steps that the algorithm can
make. Thus, for such an algorithm, if EXPR(ALGO()) allows ALGO to take 500
steps, then EXPR(EXPR(ALGO())) allows it to take 1000 steps. This latter
effectively removes the limit altogether as more steps can be generated at will,
whereas the former simply produces a new limit that is a multiplication of the
original limit. In a oneversustheother scenario, the former yields more
bangforthebuck. Of course, an algorithm can do both, but it usually isn't
necessary.
Beyond either of these techniques, it is also possible to use the available
steps exponentiallywhich increases the available steps far beyond realistic
use even with a relatively small number of available steps to begin with. In
order to "grok" this kind of algorithm is doing, I visualize a tree that
represents the path of the algorithm through the available steps. An efficient
implementation of such an algorithm will use a relatively small depth before
trampolining itself (e.g. 510). As the algorithm descends, it creates "stubs"
for it to trampoline itself to, and maintains a stack of "jumpback" points as
well as a stack of depths. When the current depth reaches (e.g.) 5, the
algorithm pops the stacks, resets the depth to the depth that it just popped,
and trampolines itself to the point which it popped.
1
// \
2
/ \
3
/ \
4
/ \
5
The tree moreorless looks like this when it reaches the depth of 5 for the
first time. All of the "hanging branches" represent stubs for the algorithm to
trampoline itself to. The first time, it will jump back to stub descending from
4, do one step, and trampoline back to the stub descending from 3. After
trampolining to this stub, it will generate more stubs as it descends...
1
// \
2
/ \
3
/
4
/ \
5
...until it gets to a depth of 5 again. Again, it will trampoline itself back
to the stub descending from 4, do one step, and trampoline back to the stub
descending from 2. Again, it will generate more stubs as it descend back down
to 5.
1
// \
2
/
3
/ \
4
/ \
5
When the algorithm finally reaches the 5 on the far left of the tree (after
quite a few steps) the stack of jumpback points will be empty. When this
happens, the algorithm resets everything and trampolines itself to the extra
stub hanging off of 1 and the entire process begins again beginning with 2 and
going down to 6 instead of five:
2
// \
3
/ \
4
/ \
5
/ \
6
Thus, the algorithm is getting about 2^5 (32) steps for each available step. If
the depth used is 10 instead of 5, the algorithm would get around 2^10 (1024)
steps for each available step. Given 512 steps, the algorithm would be capable
of performing around 502 * 2^10 steps (which is a little more than a half
million steps). If each step generated two stubs instead of one, the
exponential base would be three instead of two. IOW, it isn't all that hard to
make the number of available steps explode.
The interesting thing (to me) is that this whole process is still fairly
efficient. It isn't hard to generate an exponentional number of steps. E.g.
#define A(x) B(B(x))
#define B(x) C(C(x))
#define C(x) D(D(x))
#define D(x) E(E(x))
#define E(x) F(F(x))
// etc.
Here, an argument to A is scanned for macro expansion a base2 exponential
number of times. Each one of those scans can do an algorithmic step. The
problem is that the argument to A *will* get scanned an exponential number of
times every timeeven long after the algorithm is complete and each additional
scan is a noop. It is expensive to scan even the simplest sequence of tokens
millions of times.
The interesting thing about the exponential algorithm described above is that it
only generates stubs instead of generating the entire tree at once (which is
what the latter example does). Each unused stub represents an extra scan
applied to the output of the algorithm. So, if the algorithm generates output
(or terminates) at some point, whatever the output is will get scanned a few
extra times, but it is at most (e.g.) 5+1 extra scans (the +1 comes from the
extra stub at the top of the tree). Such a low number of extra scans is
insignificant even in the worst case scenario where the algorithm generates
output at the bottom of the tree.

I realize that the above is supercomplex to those not comfortable with far more
basic idioms in Chaos, and I don't really expect anybody to understand it (but
maybe get the general idea). Also, these are implementation details of Chaos.
Most users would never need to touch anything like the above. I'm merely
elucidating what can be doneand what I actually have donebecause of the
dynamic syntax of macro expansion. It is a pretty powerful thing for an
algorithm defined with only a few macros specific to the algorithm to take a
half million steps in an environment that doesn't allow recursion. It is also a
pretty powerful thing to get rid of all of the "you can't use this macro inside
this other macro" limitations that the pplib contains. We had a recent example
of this with the typeof library's registration macros (i.e. restricting the
macro to using only autorecursive library interfaces). Chaos gets rid of those
kinds of [vertical] implementation dependencies. The pplib is chock full of
them. They don't come up all that often only because what most people are using
the pplib for is relatively trivial, and when they do come up, it is only
because there wasn't a simple alternative that avoided the problem. When they
do come up, it makes it seem like the failure makes no sense, and it is almost
impossible to debug for users that aren't familiar with the innards of the
library (because that is where the failure ultimately occurs). When I rewrote
the pplib (some years ago now), I was well aware of these limitations as well
as the problems associated with the plethora of different "recursion states"
(such as 'd' for BOOST_PP_WHILE, 'r' for BOOST_PP_FOR, 'z' for BOOST_PP_REPEAT,
etc.). In fact, there are actually more distinct states than the library
presents. In some cases, it merges more than one physical state into a single
one by deducing the state on multiple physical sets of macros that implement
algorithms at the same time. E.g. WHILE is implemented with (at least) 256
macros that are near identical to each other. FOR is implemented with (at
least) 256 macros that are near identical to each other. The recursion state
for each is the next unused macro in each of those sets of macros. To merge the
two, the deduction process would have to find the nth macro that is available
in both sets at the same timewhich can leave gaps of unused macros in each of
those sets of macros (which isn't a big deal).
I have never liked that hack to keep the number of distinct "recursion states"
down, nor have I liked the lack of extensibility in general. Designing a
nonhigherorder algorithm using a higherorder algorithm is not a great
problem, but the second you try to define a higherorder algorithm using a
higherorder algorithm, you're going to run into this problem. You have two
choices, bubble the vertical dependency (not a good choice) or replicate macros
to match each entrypoint into the higherorder algorithm used (not a good
choice). In the latter case, you end up with yet another recursion state for
this new algorithm. You can either live with another state (not a good choice)
or "merge" the state with the state of the used higherorder algorithmwhich
requires modifying the algorithm (not a good choice). IOW, its always about
choosing the better of several bad alternatives. Basically, the extensiblity
model of the pplib is woefully insufficient, which is why Chaos uses a
radically different tack for this sort of thing.
Regards,
Paul Mensonides
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk