Boost logo

Boost-Build :

From: David Abrahams (gclbb-jamboost_at_[hidden])
Date: 2003-05-13 08:43:26

Vladimir Prus <ghost_at_[hidden]> writes:

> David Abrahams wrote:
>> I have been working all evening and am fading fast, but I promised to
>> post something about this before morning so Volodya would have a
>> chance to look at it if he's back from holiday... so I'll be brief:
> Dave, I'm really sorry that I did not reply yesterday. It turned out I had to
> waste all the day on some unpleasant business --- checking exam papers, so I
> was offline most of the time.

That's OK. Your students are important, too. Sorry it was painful
for you ;-)

>> Currently, there is a "multiple" parameter passed through the
>> generation process designed to handle cases like this:
> [...]
>> The ability to choose "non-greedy" behavior is still needed, though,
>> for many reasons. The rule is that non-greedy behavior is chosen
>> when the invoking (rightmost) generator has multiple sources but no
>> *'s in its signature [in the current sources, the * corresponds to
>> calling the generator "composing", but I think signatures are a
>> more-general and accurate way of expressing this]. That was done to
>> handle cases like:
>> %.dlp %.o -> %.exe (built from exactly one .o and one .dlp)
>> %.cpp -> %.o
>> %_parser.whl -> %.cpp
>> %.dlp -> %.cpp
>> %.wd -> %.whl %.dlp
>> In this case the desired transformation is:
>> --->foo.whl-->foo_parser.cpp-->foo_parser.o---
>> / \
>> foo.wd--< >-->bar.exe
>> \ /
>> --->foo.dlp-----------------------------------
>> If the .dlp is converted "greedily" to a .cpp, a suboptimal
>> transformation will be found:
>> --->foo.whl----->foo_parser.cpp-->foo_parser.o-->bar.exe
>> / -->foo_lexer.cpp--->foo_lexer.o--->
>> foo.wd--< /
>> \ /
>> --->foo.dlp-
>> My first contention is that the criterion for choosing greediness
>> doesn't do what it is supposed to do. For example, we could go back
>> to a "composing" signature for .exe above:
>> %.dlp*,%.o* -> %.exe (built from any number of .o and .dlp files)
>> Although that clearly wouldn't change the optimal transformation
>> sequence, it would turn greediness back on, and we'd get the
>> suboptimal transformation.
> The question is what's 'optimal transformation'. When generator needs one dlp
> and one cpp file

That's not the case I was looking at above, but...

> there's no 'optimal one' --- there's single one that could
> work.

%.dlp %.cpp -> %.exe (built from exactly one .cpp and one .dlp)
%_parser.whl -> %.cpp
%.dlp -> %.cpp (*** did you mean to have this case? ***)
%.wd -> %.whl %.dlp

In this case the desired transformation is:

/ \
foo.wd--< >-->bar.exe
\ /

The search proceeds as follows in my proposal.

Step 1:

---%.whl ; %.whl<----%_parser.cpp ; %.cpp<--%.exe
/ --%.dlp
%.wd<--< /
\ /
---%.dlp ; %.dlp<-

We have found one way to create a .cpp from our sources...

Step 2, we must examine the other way to create a .cpp, to decide
which one is better:

---%.dlp ; %.dlp <--- %.cpp ; %.cpp<--%.exe
/ --%.whl
%.wd<--< /
\ /
---%.whl ; %.whl<-

The two transformation paths are equivalent according to my criterion,
but IIUC you're saying that they are not; the 2nd one will fail to
satisfy the requirements for building a .exe file.

Your point is taken. True path equivalence requires that the products
of the transformation sequences are also identical. It's clear that
you have to defer pruning either one until you get all the way back to
the invoking generator.

> For the second case, there's ambiguity, but is there 'optimal
> transformation'. The situation is that exe can consume both .dlp and .o,
> where .o can be created from .dlp. That's messy, and at least deserves a
> warning. And it's also a corner case, so optimal behaviour here is not so
> imporant.

I think there is a generalized algorithm that will handle all of these
cases correctly. I believe that I just got the equivalence criterion
wrong, but that this is a kind of dijkstra (branch-and-bound) search
to which we can apply dynamic-programming.

>> My second contention is that greedy conversion is not really the
>> answer it appears to be. I'm going to base this assertion on the
>> idea that there's no good criterion for applying it... so I don't feel
>> I need to justify it further ;-)
> I think the criteria is simple. Generator passes 'multiple' to recursive
> 'generators.construct' call if it can do something sensible to multiple
> returned targets. Generator with single input can do something. Generator
> with two input types cannot.

Suppose some generator further up the chain has no use for the
multiple returned targets? I guess that's why "multiple" is passed
to the left during the generator search process?

What if you have a generator that requires exactly two .cpp files and
one file of some other kind?

>> Instead, I'm going to propose what I think is the "correct" solution.
>> The whole point of introducing this greedy behavior was to eliminate
>> ambiguity [as shown above**]. Instead, I propose we figure out how to
>> detect when two transformation sequences are equivalent. What we want
>> to be able to do, when confronted with two sequences that return
>> "extra" targets and appear to be ambiguous, is to detect when
>> processing the extra targets of each one would result in the same
>> overall set of build actions. Then we can discard either one of the
>> two sequences, arbitrarily, without loss of information.
> (very very side note: it would be dreadfully cool if we can avoid searching
> for one of sequences: i.e. after trying .dlp->.cpp generator to detect that
> trying .whl->.cpp would not bring anything new).

I think lots of this coolness could be available using the appropriate
search algorithm. I just need to figure out what that is. Actually,
I think A* or something like it may be possible... I need to work on
the problem a little more, but I think the search actually has to
proceed from sources.

> Provided we can disambiguate, we don't need greediness. But... do we
> get anything by not converting extra targets greedily and passing
> them up to top-level generator? I think that if transformation found
> for that extra target is the same in both cases, we call well
> convert greedily, since it's simpler (IMO).

#1: as demonstrated, it fails to find the optimal conversion sequence
in some cases. I'd be a little surprised if it didn't fail
altogether in some cases, but I can't prove it yet.

#2: more importantly, it _complicates_ the search algorithm by adding
special cases. It is very difficult ot understand.

> If tranformations can differ --- it's a big question. [Here I was going to
> present example where passing extra target up will fail, but found an example
> where both approaches will fail]. Consider
> .whl -> .cpp, .h
> .cpp -> .o
> .o* -> .exe
> .h -> moc_%.cpp # This is QT transformation.
> See the gotcha? The extra .h target produced from .whl is considered as if it
> were a source, and processed with moc, while it should not:
> .whl -----> .cpp ----------------> .o ----- .exe
> \ /
> \--> .h ---> moc_%.cpp -> .o ---/

Yes, in this case I think it's important to be able to label the .h
file as "not requiring consumption".

> Let's put this case aside for a while. The real difference between
> greedy and pass-up approached is that in first case extra targets
> can only be converted to types along the chain where they were
> generated. In example above, .h can be converted only to .cpp, or
> .o, but any other transormations you might have are ignored. With
> passing-up, you'll examine transformation from extra target to all
> the type top-level generator can consume. This *sounds* better, but
> only sounds.

I realize it's not perfect. I'm reformulating my ideas now.

>> Here's my formula for detecting equivalence.:
>> 1. Observe that each generator search path unwinding from successfully
>> finding a source type or types from the target types is a linear
>> chain (which may include an arbitrary of extra targets to be
>> processed as sources).
> I'm afraid it's not so. This is again QT example:
> .ui -------->.cpp ----->.o
> \ /
> \----->.h (has type UIC_H)
> Here, '.h' is not actually 'extra target'. It's explicitly created by
> UIC,UIC_H -> CPP generator and is needed to create the .cpp. There are some
> subtle points --- i.e. the .h files in reallity is not used by the tool which
> generates cpp, but I hope you get the point.

I think that's the same problem as you showed above: the .h file must
be allowed to not be consumed. The rightmost "/" character in your
diagram above doesn't show up during unwinding from finding
.ui-->.cpp, so at that stage, the .h is an "extra target not
requiring consumption".

>> 2. Observe that each search path (P1,P2) is a sequence of generator
>> choices, beginning with sources and ending with targets.
>> 3. Algorithm:
>> q1 = P1
>> q2 = P2
>> while q1 is not empty and q2 is not empty:
>> g1 = head(q1)
>> g2 = head(q2)
>> if g1 != g2:
>> if source_signature(g1) intersects source_signature(g2):
>> P1 is not equivalent to P2
>> else:
>> P1 is equivalent to P2
>> otherwise, P1 and P2 are the same path
>> Basically, this says that if the two paths ever decide to do
>> different things to the same sources, they are not equivalent.
>> Otherwise, further exploration of the extra sources produced by
>> either one will lead to all the transformations of the other.
> So, the idea is that if we have
> t1-------- (generator 1)
> /
> source --(last common generator)-<
> \
> t2-------- (generator 2)
> we declare that there's no abmiguity between selecting generator 1 and
> generator 2, because if we select the first, then t2 will be produced anyway,
> and the transformation starting with generator 2 will be executed as
> well.

Yes, that was the idea.

> But how do you prove this? We need to make sure this works at least
> as good as current scheme, if we're to move.

There's always, "passing all the existing tests" ;-)

Actually, I want to explore a search-from-sources approach also.

Dave Abrahams
Boost Consulting

Boost-Build list run by bdawes at, david.abrahams at, gregod at, cpdaniel at, john at