Boost logo

Boost :

From: Oliver Kullmann (O.Kullmann_at_[hidden])
Date: 2006-02-15 16:39:10


On Tue, Feb 14, 2006 at 04:58:20AM -0800, Paul Mensonides wrote:
> > -----Original Message-----
> > From: boost-bounces_at_[hidden]
> > [mailto:boost-bounces_at_[hidden]] On Behalf Of Oliver Kullmann
>
> > First of all I think here the documentation of the library
> > doesn't reach its full potential yet: I tried first to
> > read through "Topics", but found all examples contrived
> > and complicated (couldn't find a single plain example),
>
> What is a "plain example"?
>

An example which shows one thing, without any additional fuss,
so that it immediately clear what the example is doing, and
how it is achieved.

If I enter the current documentation, then first I get to the
short introduction page with a link to the chapter from the mpl
book, which is based on earlier chapter in the book, and thus
cannot be used as an introduction.

So then I continue to Topics (the next item on the list), where
the natural choice seems to look for "Motivation": But here
again there is no plain example, but we need to know at this
point what the "is_function<> metafunction in Boost" is (no attempt
is made to explain this thing).
So the solution presented down the page doesn't motivate anything
for anybody who is not familiar with all of Boost (it is not
even said where we could find this function in Boost, i.e., to
what sub-library it belongs).

Such examples I call "non-plain". A plain example would consider
for example the problem of how to create an enumeration out
of a variable list of identifiers. The point is here that the
example is *simple*, that is, atomic --- not much reference to anything
else is needed.

I believe with the style of non-plain examples (which is quite common)
completely unnecessary hurdles are created.

> > and thus I skipped that and went to Reference, where
> > I found BOOST_PP_SEQ_FOR_EACH, making some still mysterious
> > remarks about "The next available BOOST_PP_FOR repetition."
> > etc. But at various places it somehow says that "reentry problems"
> > whatever this means are solved now.
>
> There is an article in the topics section about this (titled "Reentrancy").
>

How does the reader of the documentation get there? After I found that the Motivation
does not motivate much for me, I went on to "Known problems" --- alright, I understand
that, but it doesn't give me much. So well, going to techniques : Again the very first
example doesn't say what it is supposed to do, and so it continues; all examples are part
of some discussion within an expert round of users (who don't bother to even spell out the
problems they are discussing).

So let's skip techniques. Next comes "Incompatibilities". It speaks about the previous
Boost release 1.28, so it seems not of relevance anymore (especially since I'm a new user).
So let's skip this, going to "reentrancy". From the headline it's not clear what could be
meant by it (if you don't know it). Alright, the first few paragraphs there name the problem.
But actually not explicitely, it doesn't say that C or C++ in version soandso has this "feature",
but it speaks of "the preprocessor", which might mean a specific compiler, some special features
or this very library itself. It also speaks of recursion, which in the usual sense of the word
(that is, in mathematics) is not what is really meant.

I want to make the point here, that as a documentation author one has similar responsibilities
to a car driver: One must anticipate problems. Readers always have a certain special
angle how the view it, and one must add many hints and redundancies to help them becoming aware
of the incongruence of their point of view with the authors point of view. My main assumption
was, that C and C++ respects the programmer, and thus does not disallow potentially "dangerous
constructions". So in my mind I glanced over the problem with "recursion", and basically thought
that this is no problem for C and C++ (the whole process might go wrong, but we are in control).
Coming from this point of view, I then see "recursion", guess they mean something different, and
moreover I have already seen somewhere that "reentrancy" (whatever this means) is solved by
the new version of the library anyway, so I don't need to bother (not to forget that when I come
to this point in the documentation, then I already "learned" that apparently most parts of the
documentation have to be skipped).

For the reader there are always many unknowns to be solved (what is the language definition here?
what is the role of this library? what is the role of the version?), and usually the resulting
system of equations doesn't allow a solution (so one needs an iterative approach, reading through
it again and again, until it finally seems to converge). But the documentation write can offer
great help here by saying as much as possible and as precisely as possible (the problem here is
a fundamental C/C++ problem, given by the language definition; the library offers specific ways
around them; and the version number needs to be brought up to the current version --- otherwise
there is always this cloud of uncertainty, how much this all is really relevant).
 
> > Only when looking it up
> > in the rationale of C99 I found a clear statement about the issue
> > (while the C++ standard somehow supposes you know already the
> > issue).
>
> It is pretty clear in both standards (6.10.3.4/2 in C and 16.3.4/2 in C++).
>

I'm not aware of *any* book on C or C++ discussing this issue. I have quite a collection of them,
read (or have read) CUJ, CVu and Overload, but the preprocessor seems to be a tabu.
This I think one should keep in mind for the documentation.

Alright, leaves the standard(s). I worked through all examples there, but none of them shows the "recursion
problem" (at least in the C++ standard). Sure, finally one gets to the point where you understand
what they are talking about, but it takes (too) long. You start reading this chapter

A preprocessing directive consists of a sequence of preprocessing tokens. The first token in the sequence is
a # preprocessing token that is either the first character in the source file (optionally after white space containing
no newline
characters) or that follows white space containing at least one newline
character. The
last token in the sequence is the first newline
character that follows the first token in the sequence.

And so on. Among all of that you have to find the right few sentences.
  
> > So a clear statement about the limitations of the looping
> > constructions
> > *at each place* would help I would guess quite a few people trying to
> > use this library.
>
> It is unnecessary clutter. It is not the documentation's responsibility to
> constantly tell you the way that the language works. Unless the documentation
> says otherwise, it should be assumed that a macro cannot expand inside itself.
> That is simply the way that the language works.

As I said, where in the literature on C or C++ (books and articles) you find anything
on this?

> The library makes it *appear*
> that a few macros are recursive, but then it explicitly says that the macro can
> be used as if it was recursive.
>
> > (I believe documentation must be written so that
> > it can be entered in many ways, and not assuming you are reading it
> > from the start to the end --- redundancy is needed.)
>
> I disagree on all counts. It is impractical to make every page of the
> documentation a complete discourse on both how the preprocessor works and how
> preprocessor metaprogramming works. Beyond that, redundancy in both code and
> documentation is a maintenance nightmare.
>

First, text read by humans is different from text read by computers. Due to the
inherent imprecision of natural language, and the necessity for the reader to anticipate
meaning (a well-know psychological fact that without anticipation no understanding),
the reader constantly goes into wrong directions, and only redundancies (saying important
things at many places (with different words)) can help him.

And second, if the documentation would be organised like software, this might actually help:
Every file(!) (not just every translation unit) is supposed to include for example <algorithm>
again and again --- accordingly one would at least expect at every reference for some construction
(corresponding to a source code file) a *link* to the relevant general sources here ("including"
that what needs to be known to understand the reference).
 
> > Second of all it seems the current facilities in the library should
> > be enhanced (though I don't understand how FOR and FOR_r is supposed
> > to be used): BOOST_PP_SEQ_FOR_EACH is such a natural and easy thing,
> > the use of it should be promoted, while FOR and its derivates look
> > ugly.
>
> Hmm. FOR is far more general. SEQ_FOR_EACH can be implemented with FOR, but
> not vice versa.
>

sure --- in other words, FOR_EACH is easier to use!
 
> > But it seems that the library favours FOR, and seems to envisage
> > a main loop using FOR, and only at the second level using FOR_EACH.
>
> The library doesn't favor it, it simply is more general purpose. In order to
> allow the kind of reentrance that FOR allows, hundreds of macros are required
> for *each* higher-order algorithm. SEQ_FOR_EACH is only one of many.
>
> > My solution was to convert the second sequence B into a list, and
> > use BOOST_PP_LIST_FOR_EACH for the inner loop. But it seems
> > to me that this works is
> > pure chance, and with any change in the implementation of
> > BOOST_PP_SEQ_FOR_EACH
> > or BOOST_PP_LIST_FOR_EACH (introducing a common macro) this solution
> > will break.
>
> It will only break if SEQ_FOR_EACH or LIST_FOR_EACH arbitrarily decides to use
> the other--which won't happen. I design the algorithms in the library, and I'm
> well aware of vertical dependencies and their implications.
>

I was referring to the documentation, which seems not to speak about this issue.
 
> > According to what I understand from the documentation, the
> > natural solution,
> > offering for example BOOST_PP_SEQ_FOR_EACH_2, which uses
> > macros guaranteed
> > not to occur in BOOST_PP_SEQ_FOR_EACH, has been deprecated in
> > favour of
> > a more complicated solution.
>
> You don't understand the implications of what you're saying.

sure

> In order to do
> that, I'd need to define hundreds of macros to implement SEQ_FOR_EACH alone. It
> isn't as simple as just replicating the interface macro a few times. Instead,
> I'd have to reimplement the entire algorithm from scratch--intentionally
> avoiding the use of any other higher-order algorithm.
>

the whole thing looks quite complicated; why not adding a FAQ page to the documentation
(I was looking for one, but didn't find one) answering for example my above request in
this way? I think it would have helped me.
 
> > But it seems this more
> > complicated solution
> > works only for one kind of loop, and not for others, and the
> > documentation
>
> I'm not sure what you're referring to by "more complicated solution". If you
> specify what kind of thing your referring to, I'll expound.
>

The documentation says

In particular, the library must provide reentrance for BOOST_PP_FOR, BOOST_PP_REPEAT, and BOOST_PP_WHILE. There are two mechanisms that are used to accomplish this: state parameters (like the above concatenation example) and automatic recursion.

Thus it seems only for these loops you have a special mechanism for reentrancy, and this special, "generic" mechanism"
is what I mean with "more complicated" : for the user it's easy to use BOOST_PP_SEQ_FOR_EACH and BOOST_PP_SEQ_FOR_EACH_2,
but it's complicated to use the generic method.
 
> > seems so arcane that I don't have a clue how it could work
> > (the typical
> > phenomenon, that the documentation wants to show off by
> > presenting only
> > complicated examples, which show all sorts of templates,
> > etc., and this
> > without saying what actually should by achieved --- good code
> > solves one
> > problem at a time, and good documentation discusses one
> > problem at a time).
>
> The documentation has many examples, but you call them either complicated or
> contrived.

Let's go to the example section:

First we have "array_arithmetic.c": It says "This example implements over 2200 functions".
Sounds impressive, but is not an example (sounds more like an extension of the library).
Among those six examples only "catch_builtin.cpp" seems to be close to an example (which
is something you look at and you understand).

In the reference section there are actually simple examples, but as far as I can see always
only one example: I believe a systematical declination of the elementary cases would be
very helpful (always staying simple, just handling simple sequences, but with example
5 to BOOST_PP_SEQ_FOR_EACH for example on would show a nested loop ("without added meaning",
that is, no template stuff etc.)).

> What exactly to you want? This kind of stuff doesn't boil down to
> _simple_ use cases.

My understanding is that use cases are not very helpful. What we need is *understanding*.
In the best of all worlds, the library would present a complete mathematical definition,
and the reader would understand it --- that's it. Now in reality nothing is defined about
C or C++ (the standard is not without reason often referred to as "the bible"), and if it were,
then (given the current state of specification) it would take us years to digest this.

So one has to use a mixture of general explanations and educating examples. Let's make an example.

Say we want to introduce addition of integers. The general theory is too complicated, so we are
using examples. To make a little parody of the preprocessor-documentation-style, there the
example would read as follows:

----------------------------
Consider a graph G (finite, no parallel edges, no loops; generalisation to include these things
as an exercise). We have sum_{v in V(G)} deg(v) = 2 |E(G)|.
----------------------------

??? What does this want to say to us? What are "graphs" etc. ??? But shouldn't one know all these
things!! And it's a very nice fundamental property of graphs, just *using* addition in two ways.
And here come other examples : What about projective geometries ... (all showing nice examples
of addition, and it will enlighten the reader enormously to think through all of them).

Alright, here now comes what I would consider as good examples:

0 + 0 = 0
0 + 1 = 1
0 + 2 = 2
0 + 3 = 3
understood?

1 + 0 = 1
1 + 1 = 2
1 + 2 = 3
understood?

2 + 0 = 2
2 + 2 = 4
2 + 5 = 7
understood?

5 + 7 = 12
100 * 2 = 200
sum_{i=1}^n i = n (n+1) /2.

Hope you get what I mean.

> Rather, typical use cases are parts of overarching designs
> that are complex--which is precisely the reason that preprocessor
> metaprogramming enters the picture at all.

As I said, I don't want use cases, I just want to use the preprocessing library
(for my own purposes).

> Furthermore, I have a responsibility
> (as does any library author) not to promote naive or poor uses of the
> library--which is almost always the case with examples that implement something
> simple.

So the documentation is the guard, meant to be as a kind of torture, and only
those worthy souls which get through shall use the library?

> So, I can do two things--I can provide simple contrived examples that
> demonstrate what various things do, and I can provide complex examples that
> implement real world scenarios. The documentation does both already.
>

Speaking for myself, I never ever found any interest in those "real world scenario's"
(for this library and others), since their "real world" is not mine.
Of course, other might disagree, and having a healthy mix of simple and more
complicated examples sounds like a good thing.
 
> > So if the library offers a natural way of nesting
> > BOOST_PP_SEQ_FOR_EACH,
> > then documenting this (directly, without solving other
> > problems) would be good,
> > or otherwise adding BOOST_PP_SEQ_FOR_EACH_2 would help some
> > users (actually,
> > just the presence of this second form would announce the
> > underlying problem).
>
> The library 1) shouldn't have to announce this problem

I meant the documentation

> and 2) already does
> announce this problem--it just doesn't do it hundreds of times in hundreds of
> different places. If you are writing code in the language defined by the
> preprocessor, you should *at least* know the basics of that language. The
> library is not the language, it is merely a systematic set of reusable constructs
> written in that language. The documentation's only responsibility is to
> document itself.
>

yes, but neither you are an angel (so that you could give a precise definition of the whole
thing) nor am I (so that I could read it); instead we are human beings
nd mst fnd r wy thrgh ntcptn (there are nicer examples; but hopefully this works a bit:
just looking at it I hope you thought huh, but knowing that I left out the consonants it
shouldn't be too hard; I have seen a very nice series of examples in this style, where you
were guided to read more and more strange stuff, and at the end you were presented with
a plain English sentence --- and you couldn't read it anymore!).

 
> -----
>
> I'm trying very hard not be harsh, but what you suggest has far-reaching
> implications that you don't understand.

sure (you mean not fully)

> Regarding examples in the
> documentation, I don't think there is a way to please you

perhaps there are ways: I believe actually in small steps, massaging something
reasonable here and there so that it gets better. If some of the above thoughts
would lead to some additions here and there (a few remarks, hints) in the documentation,
that would mean progress.

--see posts by me in
> this thread:
>
> http://thread.gmane.org/gmane.comp.lib.boost.user/16132
>

In there you have this thing about "use cases". So in the above example
about addition I forget the "use cases" (to which the user "can relate").
Perhaps it goes something as follows (recall the purpose is to explain addition):

Assume you have apples and bananas (if you don't like them, then please use
strawberries and cherries; if you don't like them then I can't help you).
Imagine a big bowl of apples, and a smaller pot of bananas (that's how I organise
them). They are placed in a tidy kitchen (to enter virtual worlds)
etc etc etc
Finally: 5 apples plus 7 bananas gives 11 something.

So I'm not such a great fan of use cases.

> -----
>
> Regarding the technical implications:
>
> (Please make the effort to follow this.)
>
> As it stands now, the library emulates recursion by replicating macros. For
> example, a WHILE loop can be implemented such as:
>
> #define WHILE_1(pred, op, state) \
> IF(pred(2, state), WHILE_2, state TUPLE_EAT(3))(pred, op, op(2, state)) \
> /**/
> #define WHILE_2(pred, op, state) \
> IF(pred(3, state), WHILE_3, state TUPLE_EAT(3))(pred, op, op(3, state)) \
> /**/
> // etc. (i.e. hundreds...)
>
> A predicate is passed that determines when the algorithm should stop, and an
> operation is used to iterate the state on each step of the algorithm.
>
> There are a couple of things here must be understood. First, each WHILE_xyz
> represents both an entry point into the algorithm (i.e. an interface to the
> algorithm) *and* a single algorithmic step taken by the algorithm. So, WHILE_2
> does not mean "separate nested WHILE loop" or "second looping dimension".
> Instead, it means "the step after WHILE_1".
>
> Now, when the algorithm (meaning the entire collection of macros) invokes the
> predicate and the operation, it passes along the number of the next available
> WHILE_xyz macro. This value is called the "state" of the algorithm. (Note that
> this is not the 'state' parameter. That is the (arbitrary) state the a
> particular *use* of the algorithm is iterating.) The predicate and the
> operation can use this algorithm state to reenter the algorithm, but before the
> algorithm itself actually continues into that macro. As the algorithm
> continues, the algorithm state values passed to the predicate and the operation
> change (i.e. they increase). This is important. They do not remain fixed,
> which means that the predicate, for example, cannot just arbitrarily use
> WHILE_2. It has to use the WHILE_xyz indicated by the algorithm state value
> passed to it.
>
> Obviously it is possible to provide several totally distinct implementations of
> the entire loop, but that implies several hundred macros per loop, instead of
> sharing a single set of several hundred macros for all loop uses. Say you did
> this anyway, and you get WHILE_A, WHILE_B, WHILE_C (and so on, each with their
> own WHILE_A_1, WHILE_A_2, etc.). This doesn't really buy you anything, because
> you'll still ultimately end up passing A, B, C (etc.) as a reentry point.
>
> E.g. suppose you write some macro that uses WHILE_A, and then you change another
> macro so that it uses this macro, but this other macro was already using
> WHILE_A. Fine, you just change this other macro to use WHILE_B. But then,
> there is another macro that uses this second macro, and this third macro is
> already using WHILE_B, so you have to change that to use WHILE_C. This is a
> significant bubbling of implementation detail, and it is *way* worse when all
> three macros are written by different people. Ultimately, you arrive at the
> conclusion that it doesn't matter *which* WHILE the first macro uses, so you
> parametize the first macro with the WHILE to be used, Similarly with the second
> (and when the second calls the first, it uses the next WHILE from the one it is
> currently using). Similarly again with the third. So, instead of having the
> most nested use of WHILE be WHILE_A, you have the outermost use be WHILE_A, and
> each inner use adjusts, so that the first nested use uses WHILE_B, the next
> nested use uses WHILE_C, and so on. This is the only way to do it that scales,
> and this is ultimately no different than just having one WHILE with one set of
> implementation macros.
>
> This is exactly (minus a lot of workarounds) the way that the library emulates
> recursion (particularly: in FOR). The problem comes when you write a
> higher-order algorithm on top of another higher-order algorithm. For example,
> SEQ_FOR_EACH on top of FOR. (Higher-order algorithms are fundamentally
> different because the implementation of the algorithm cannot completely control
> what macros are called inside of them.) It doesn't take hundreds of macros to
> implement SEQ_FOR_EACH because it can reuse the more general purpose algorithmic
> steps provided by FOR. In order to make SEQ_FOR_EACH reentrant, you need, at
> minimum, a distinct interface macro (i.e. SEQ_FOR_EACH_1, SEQ_FOR_EACH_2, etc.)
> and distinct macro to be repeated by FOR for each possible entry point into
> SEQ_FOR_EACH.
>
> That isn't too bad in terms of number of macros, but you introduce yet another
> algorithm state by doing so. If each higher-order algorithm did so, the user
> would be inundated by the states of various algorithms. E.g. just to invoke
> SEQ_FOR_EACH, you'd need to supply both the state of SEQ_FOR_EACH and the state
> of FOR. Worse, if another algorithm (with multiple entry points) uses
> SEQ_FOR_EACH, you'd need the state of that algorithm, the state of SEQ_FOR_EACH,
> and the state of FOR to use it. You resolve this in one of two ways: 1) you
> simply don't provide reentrance in higher-order algorithms that are implemented
> using other higher-order algorithms, or 2) you reimplement the algorithm from
> scratch to avoid using other higher-order algorithms. The former is what the
> library does, the latter requires hundreds of macros to implement each
> higher-order algorithm.
>

Thanks for the explanation! (By the way, to the reader: There are always dragons
out there requesting to minimise the use of quotes --- but in the above case it
would destroy readability.)

I can understand the general point of view. But this general point of view might not
necessarily be the point of a user (who in general will just occasionally use the library).

Perhaps it might be a good idea to add the above explanation somewhere to the documentation?

So if I understand it correctly, you currently have three independent looping constructs
BOOST_PP_FOR, BOOST_PP_REPEAT, BOOST_PP_WHILE,
while all other loops are implemented in terms of these?

Then it would make sense to concentrate on the documentation of these three macros,
giving more examples (showing also how to simulate nesting), and emphasising for
e.g. FOR_EACH its special character.
 
As far as I can see from the material provided on more powerful pp-libraries, the
Boost pp-library won't be further developed, but at some time replaced?
I hope nevertheless that some additions to the existing documentation might
be possible.

Keep up the good work!

Oliver


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk