Boost logo

Boost :

Subject: Re: [boost] [preprocessor] Sequences vs. All Other Data Structures
From: Paul Mensonides (pmenso57_at_[hidden])
Date: 2012-04-23 12:50:41


On Mon, 23 Apr 2012 07:01:41 -0700, lcaminiti wrote:

> Paul Mensonides wrote
>>
>> Hello all.
>>
>> In the past, I've said several times that sequences are a superior to
>> every other data structure when dealing with a collection of elements
>> (tuples are better in some very specific contexts).
>>
>>
> Something that I found missing from Boost.PP docs is computational
> complexity (in terms of number of required macro expansions?) of the
> different algorithms for the different data structures. I take it that
> sequences have the best performances (i.e., smallest pp time because
> smallest number of macro expansions -- is that true?) but is that true
> for _all_ sequence algorithms when compared with other data structures?
> or are some tuple/array algorithms faster than the pp-seq equivalents?

Assuming amortized O(1) name-lookup, computational complexity with macro
expansion is really about the number of tokens (and whitespace
separations) scanned (a particular sequence of tokens is sometimes scanned
more than once) and less about number of macro replacements per se. In
terms of raw performance, nesting of macro arguments (i.e. M(M(M(x))))
also plays a significant role because each argument (provided it is used
in the replacement list without being an operand of # or ##) causes a
recursive invocation of the scanner (along with whatever memory and cycle
requirements that implies), but also because those recursions are not tail
recursive. The output of those scans must be cached for later
substitution. Generally speaking,

#define A(x) B(x)
#define B(x) C(x)
#define C(x) x

A(123)

is more efficient than something like

C(C(C(123)))

because the former doesn't build up as much cache (in the typical case)
and has fewer caches alive at any one point. For an extremely small case
like the above, however, the latter might actually be faster because the
former builds up more hideset bookkeeping (virtual tokens or whatever
other mechanism is used).

Boost.Preprocessor doesn't provide that information is because it is
extremely difficult to derive and because it is wildly different for
different compilers.

Chaos doesn't attempt to do it either. Chaos does usually indicate
exactly how many recursion steps are required given some input, but that's
a measure relative to a different constraint (not performance, per se, but
a type of resource). (The Chaos docs actually abuse big-O because I
couldn't decide whether I wanted exact forms or big-O, so the docs current
have (e.g.) O(2n) instead of just O(n). I actually need to change those
formulas to be auto-generated from LaTeX or use MathML.)

> For example, I would have found such an analysis of Boost.PP
> computational complexity useful when programming complex pp macros
> (i.e., Boost.Contract). I've ended up using pp-lists everywhere and that
> was probably a very bad choice (and in fact Boost.Contract compilation
> times are huge :( ). I've used pp-list just because they handle nil "for
> free" but I could have used a pp-sequence (nil)(elem1)(elem2)... -- with
> a bit more implementation work to deal with the leading (nil) -- had I
> saw evidence that the pp-sequence implementation would have been truly
> faster.

Lists in Boost.Preprocessor are the only data structure type that can have
a nil state. However, it is worse in nearly every (practical) way that
comes to mind. As you mention, in client code, one can simulate nil for
other data types in a variety of ways. Perhaps the most efficient would
be something along the lines of:

bseq: (0)(~) or (1)(seq)

// GET bseq
#define GET(b) GET_ ## b
#define GET_0(x)
#define GET_1(seq) seq

// IF_CONS bseq (t, f...)
#define IF_CONS(b) IF_CONS_A_ ## b
#define IF_CONS_A_0(x) IF_CONS_B_0
#define IF_CONS_A_1(x) IF_CONS_B_1
#define IF_CONS_B_0(t, ...) __VA_ARGS__
#define IF_CONS_B_1(t, ...) t

#define EAT(...)

#define A(bseq) IF_CONS bseq (B, EAT) (GET bseq)
#define B(seq) seq !

? A( (0)(~) )
? A( (1)((a)(b)(c)) )

> One of those reasons is that, if a library is built for it, non-unary
> (or even variadic) elements are natural because the structure of the
> sequence itself encapsulates each element.
>
> (int)(std::pair<int, int>)(double)
>
>
> This would be very convenient. I programmed a case-by-case version of
> these in Boost.Contract but not a general variadic extension of
> pp-sequence.

A lot of stuff would have to change for the boost-pp to deal with these.

>> Another reason is that, if a library is built for it and if
>> placemarkers ala C99/C++11 are available, there is a natural
>> representation of the nil sequence:
>>
>> /**/
>>
>>
> This would also be very convenient. But also automatically handling
> `(nil)` for nil sequences would be very convenient (and still faster
> than pp-lists?).

Yes, but it is probably better to predicate it as above or similarly.
(Reminds me a bit of the (bare-bones) lambda calculus.)

> As for adding Chaos to Boost.PP, I asked "why not" before and the issues
> with MVSC pp came about. Is this an opportunity to fix MSVC's pp? ;)
> (Some bugs have already been reported but unfortunately resolved as
> "won't fix".)

That opportunity has been around for years...

Regards,
Paul Mensonides


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk