Boost logo

Boost Users :

From: Paul Mensonides (yg-boost-users_at_[hidden])
Date: 2002-12-11 03:43:17


----- Original Message -----
From: "Thomas Wenisch" <twenisch_at_[hidden]>

> On Mon, 9 Dec 2002, Paul Mensonides wrote:
>
> [ snip: lengthy example of 3 different ways to generate repeated code
> over a cartesian product of substitutions ]
>
> These examples were very enlightening, thanks for going to the trouble of
> posting them to the list for all of us to see. I have a few questions:

No problem. Any questions about the Preprocessor Library from anyone here
I'd be happy to answer. I am also happy to help anyone that needs it if I
can--they don't call me the "PP Guru" for nothing, and if I can't (i.e. no
time (or patience) ;)) I can still point you in the right direction.

> 1) Is there any reason to prefer lists over sequences on any compilers? It
> seems that the syntax for sequences is much nicer, since you don't have
> to spend time counting paranthesis.

Yes. A list has direct support for the nil state. I.e. 'BOOST_PP_NIL' is a
nil list. Sequences, on the other hand, cannot be empty. In many
application areas this doesn't matter, but in others the necessary "hacking
around" that you have to do to deal with faked empty sequences makes it more
worthwhile to use lists. Incidently, when/if C++ gets the preprocessor
upgrades from C, a nil state will be directly supported. (C99 allows empty
arguments.) Dealing with sequences is typically faster than lists and can
easily replace tuples (e.g. (x, y, z)) for random data caches. I.e.
element-wise sequence access is a *very* fast operation, even if the
sequence is huge (supported up to 256, I believe). Lists cannot be
effectively used this way--you'd bring Comeau C++ (for example) to a
grinding halt. Also, appending to either the front or back of a sequence is
as fast as you can get:

#define SEQ (a)(b)(c)

BOOST_PP_SEQ_ELEM(4, SEQ (x)(y)(z) SEQ) // y
BOOST_PP_SEQ_ELEM(6, SEQ (x)(y)(z) SEQ) // a

Sequences have many advantages over lists, but there is no such thing as an
"empty" sequence--and that can be a major disadvantage. Therefore, both are
supported (plus there is a lot of "legacy" code that uses lists) in the
current CVS. Note that the 1.29 release does not contain the sequence
implementation.

> 2) What is the data structure you use in the file iteration example (the
> structure which requires prepended lengths). Are the hardcoded lengths
> neccessary for file iteration to work, or could one of the other data
> structures (ie sequences) be used instead? I really like the simple
> syntax of the sequence type. Nothing to screw up :)

I assume that you mean this one --> (3, (a, b, c)). That data structure is
called an "array," which is just an arbitrary name choice. It is, in
effect, a high-level tuple that encodes its own size. The actual file
iteration parameters are *required* to be arrays. The reason is that arrays
and file-iteration existed before sequences, and I needed away to pass
variable sized datasets into the iteration mechanism. I wanted the sample
implementation earlier in this thread to automatically adjust if the size of
the datasets increased. This meant I had to use a data type whose size I
could calculate: array, sequence, list, but *not* tuple. Lists would be a
*really* bad choice here because they a terrible choice for random access.
Furthermore, sequences are not part of the 1.29 release so I was keeping the
options open.

Altogether, there are four data types that the library currently supports:

tuple: (a, b, c)

The strength of tuples is that element access is a fast as you can get. The
downsides are that the size must be known (since it can't be detected
without variadic macros) to access it, the maximum size of a tuple is
limited to around 25, and tuples are no good for anything that requires
"resizing" the tuple.

array: (3, (a, b, c))

The strengths of arrays is that element access is nearly (but not quite) as
fast a tuples, the size is built into the structure itself (so it is not
necessary for element access), and resizing is directly supported by the
library with various primitives. The downsides are that arrays have the
same size limitations as tuples and they must have their size specified when
they are created. Ultimately, the difference between tuples and arrays is
that tuples require the size on access while arrays require the size on
construction. Either way, conversion back and forth is fairly trivial.

list: (a, (b, (c, BOOST_PP_NIL)))

Lists are typical, well-understood, singly-linked lists. They directly
support an empty state and can be very good for "algorithmic" like
manipulation. As with a runtime list, random access is terrible relative to
the other structures, but random access is not the typical use of lists.
They are good for folding (a.k.a. accumulation) and other pursuits that
often need to deal with a nil state.

sequence: (a)(b)(c) // also known as "seq" by me ;)

This is kind of a universal type. It is good for everything that lists are
good for--except the nil state issue--and is still very fast for random
access so it can replace tuples and arrays in many situations. The size is
unnecessary (it can be computed easily and efficiently) for access or for
construction. This structure is a great "general purpose" type. Also, you
can't beat it for appending efficiency since to append seq1 to seq2 requires
only: seq1 seq2.

> I must admit that I haven't read the docs of Boost.PP since before 1.28
> was released, so I know very little about file iteration. My apologies if
> these questions are already answered in the docs.

The docs (and nearly the entire library) have been rewritten with the
release of 1.29--it is mostly backward compatible but not quite (there are
docs that discuss incompatibilities). I highly urge people to get the
latest CVS sources of the PP lib (and the PP lib docs) because it is better.
In particular, I added the sequence support and full-fledge support for the
array types mentioned above.

As for file-iteration, it is conceptually very simple, but it requires a
slightly different thinking about how preprocessing works. I personally
like to think of it as an "execution path" with file iteration representing
a for-loop that iterates over files (or parts of files). It is a very
powerful
tool--just ask the Python guys or the MPL guy (Aleksey).

In any case, there is a topic devoted to file-iteration in the docs, so it
might interest you to read that. Basically, the differences between 1.28
and 1.29+ versions of the library are phenomenal. There are *massive* speed
increases on EDG-based preprocessors as well as significantly more
functionality. Specifically, preprocessor metaprogramming has moved out of
"just macros" into other interesting areas (file-iteration is an example of
this).

> Regards,
> -Tom Wenisch
> Computer Architecture Lab
> Carnegie Mellon University

If anyone here has any questions about the PP lib, I'll answer as best
as I can.

Regards,
Paul Mensonides


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net