Boost logo

Boost-Build :

From: David Abrahams (gclbb-jamboost_at_[hidden])
Date: 2003-05-12 01:03:27


Ali Azarbayejani <ali_at_[hidden]> writes:

> We are pretty sure about this high-level structural analysis, but
> uncertain on the exact names. Also, this whole scheme interacts
> intimately with the "generate" process, which is under review
> between David and Volodya. More details from David. Pending the
> outcome of that, we will have a clearer picture of how these classes
> should be structured.

I have been working all evening and am fading fast, but I promised to
post something about this before morning so Volodya would have a
chance to look at it if he's back from holiday... so I'll be brief:

Currently, there is a "multiple" parameter passed through the
generation process designed to handle cases like this:

exe bar : foo.wd ;

Construction chain (arrows from sources to targets):

--->foo.whl-->foo_parser.cpp-->foo_parser.o---
/ \
foo.wd--< >-->bar.exe
\ /
--->foo.dlp-->foo_lexer.cpp--->foo_lexer.o----

The way it works is as follows: search proceeds from exe, the target
being generated, to the source types.

Search chain:

---%.whl ; %.whl<--%_parser.cpp ; %.cpp<--%.o ; %.o,*%.a*<--%.exe
/
%.wd<--<
\
---%.dlp

"multiple" probably ought to be called "greedy". What happens, once
the source type is located, is that the extra .dlp target gets passed
up the chain. As we unwind to the previous step, if "multiple" was
set, it means it's OK to return "multiple" .cpp objects from that
step. The system attempts to convert the .dlp target to the source
type (.cpp) immediately. Since there is such a transformation, it
succeeds:

---%.whl ; %.whl<----%_parser.cpp ; %.cpp<--%.o ; %.o*,%.a*<--%.exe
/ --%_lexer.cpp ;
%.wd<--< /
\ /
---%.dlp ; %.dlp<-

The pair of .cpp objects are passed up the single chain together.

The original formula for handling these cases was: the extra target
(.dlp) gets carried up the chain with the requested target(s), and
is finally added to the main target's (bar's) set of sources to
consume. Basically, the plan was to "come back for it later":

The reason that didn't work out for this case was that when the
search reaches this stage:

... %.cpp<--%.o ; %.o,*%.a*<--%.exe
%.wd<-- ...

The next step is to find transformation sequences from .wd files to
.cpp files, and *all* possibilities are explored and compared to find
the "best" (shortest) one. If we don't do the conversion "greedily"
as shown above, we'll end up exploring the %.whl --> %_parser.cpp and
the %.dlp --> %_lexer.cpp paths, and ending up finding two equally
successful paths: one producing %_parser.cpp and an "extra" .dlp file,
and the other producing %_lexer.cpp and an "extra" .whl file. It
looks ambiguous.**

The ability to choose "non-greedy" behavior is still needed, though,
for many reasons. The rule is that non-greedy behavior is chosen
when the invoking (rightmost) generator has multiple sources but no
*'s in its signature [in the current sources, the * corresponds to
calling the generator "composing", but I think signatures are a
more-general and accurate way of expressing this]. That was done to
handle cases like:

%.dlp %.o -> %.exe (built from exactly one .o and one .dlp)
%.cpp -> %.o
%_parser.whl -> %.cpp
%.dlp -> %.cpp
%.wd -> %.whl %.dlp

In this case the desired transformation is:

--->foo.whl-->foo_parser.cpp-->foo_parser.o---
/ \
foo.wd--< >-->bar.exe
\ /
--->foo.dlp-----------------------------------

If the .dlp is converted "greedily" to a .cpp, a suboptimal
transformation will be found:

--->foo.whl----->foo_parser.cpp-->foo_parser.o-->bar.exe
/ -->foo_lexer.cpp--->foo_lexer.o--->
foo.wd--< /
\ /
--->foo.dlp-

My first contention is that the criterion for choosing greediness
doesn't do what it is supposed to do. For example, we could go back
to a "composing" signature for .exe above:

%.dlp*,%.o* -> %.exe (built from any number of .o and .dlp files)

Although that clearly wouldn't change the optimal transformation
sequence, it would turn greediness back on, and we'd get the
suboptimal transformation.

My second contention is that greedy conversion is not really the
answer it appears to be. I'm going to base this assertion on the
idea that there's no good criterion for applying it... so I don't feel
I need to justify it further ;-)

Instead, I'm going to propose what I think is the "correct" solution.
The whole point of introducing this greedy behavior was to eliminate
ambiguity [as shown above**]. Instead, I propose we figure out how to
detect when two transformation sequences are equivalent. What we want
to be able to do, when confronted with two sequences that return
"extra" targets and appear to be ambiguous, is to detect when
processing the extra targets of each one would result in the same
overall set of build actions. Then we can discard either one of the
two sequences, arbitrarily, without loss of information.

Here's my formula for detecting equivalence.:

1. Observe that each generator search path unwinding from successfully
finding a source type or types from the target types is a linear
chain (which may include an arbitrary of extra targets to be
processed as sources).

2. Observe that each search path (P1,P2) is a sequence of generator
choices, beginning with sources and ending with targets.

3. Algorithm:

q1 = P1
q2 = P2
while q1 is not empty and q2 is not empty:
g1 = head(q1)
g2 = head(q2)
if g1 != g2:
if source_signature(g1) intersects source_signature(g2):
P1 is not equivalent to P2
else:
P1 is equivalent to P2

otherwise, P1 and P2 are the same path

Basically, this says that if the two paths ever decide to do
different things to the same sources, they are not equivalent.
Otherwise, further exploration of the extra sources produced by
either one will lead to all the transformations of the other.

Hm. Not as brief as I'd hoped. Time for bed ;-)

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com
 

Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk