Boost logo

Boost :

From: Larry Evans (cppljevans_at_[hidden])
Date: 2007-06-30 12:55:12

On 06/27/2007 06:40 PM, Eric Niebler wrote:
> I'm terrible at algebra, but I'll do my best. (Extensive quoting
> follows, because I'll have trouble following this discussion
> otherwise.)

> Larry Evans wrote:
>> The problem involves my attempt to model the lookahead problem
>> as an algebra problem whose solution involved the following steps.


> I get this. Is this part purely compile time, or are some of these

Purely compile time. In an earlier version of
(which I still have in a file, morphisms.cpp, if you're interested) I
even had an recursive template compile time solution (just for empty
attribute) corresponding to the runtime iterative solution implemented

       < typename ProdList
       ( ProdList& a_productions
       , unsigned iter=our_size+1

in cfg_lookahead_extends.cpp on line 2147. If you look in, you'll see several:

                       < class OutEmptySeq

The OutEmptySeq corresponds to the assumed values of lhs attributes,
just like the superclass of the:

   < lookahead::numerals_attr AttrNum
   , class NumeralsInp
   , class NumeralsOut

in cfg_lookahead.cpp corresponds to the runtime assumed value of the

> reductions dependent on runtime information? If it's purely compile
> time, this example is fairly close to what you've just described:

Very good! That's a great help.

>> Now, IIUC, a homomorphism maps functions with arity, n, in the
>> source to functions with arity n in the target.

>> and this is what lead me to look for something like the hierarchical
>> algebra, which, I guess, is similar to the "hierarchical abstract
>> type" in (6) here:
>> The type, P, in the above tinyurl corresponds to the multisorted set
>> algebra mentioned above, and insert(empty_set,inp_0) would be an
>> example of the "primitive term" mentioned in the tinyurl.

> Ooof, I followed that link and I think I hurt myself. But I perhaps
> get the gist, which is that "first" for a bunch of alternates is a
> set union of the "first" sets of each alternate.

Yes. The in vault may be easier to follow. Lewi's the
1st author of the book on which is based.

> "Input" here is what? A character? A string? A pattern?

Just a terminal symbol. Lewi's book called terminals "inputs" and
nonterminals "outputs". This is because, I think, he used the term,
"vocabulary" as a generalization of both concepts, terminal and
nonterminal symbols.

> I suppose in the abstract it could be anything.

Well, just as a terminal symbol represents something abstract. The
lexer could represent, e.g. the terminal symbol, mulop, as the
character, '*' or something else, like '&' or '&&' or even '>>' or
maybe even the successful parse of a whole other language. For
example, a terminal symbol, 'stmt' could just be parsed as a simple
identifier by a lexer or it could be parsed a a program statement by a
statement cfg grammar.

> I don't know what a "mustisorted set algebra" is,

Sort means type. A boolean algebra just involves the single boolean
sort. See:

> but it sounds like complication.

Maybe, even probably. However, any abstraction could be considered a
complication because something concrete is easier to understand that
something abstract.

> Seems fairly simple .. you have a transform that builds a set by
> folding together the "first" sets of the alternates, and you
> recurse. Terminals yield a "first" set with one
> element. proto::transform::fold_tree would be handy for this, and
> you pass around the set you're building as a visitor parameter to
> the transform.

I'll try that. Thanks!

>> >> In, the morphisms cannot proceed down the
>> >> expression tree after a node with arity==0 is reached.

>> > Huh, there is nothing "down" after a node with arity 0. That's
>> > all you get. :-)

>> The above definition of homomorphism mapped constants (functions
>> with arith=0) to constants. However, in the case of gram2first,
>> the mapping was from constants to some term in the primitive
>> subalgebra of the target algebra (I don't even know if that's been
>> defined somewhere, but I'd be very surprised if there were'nt some
>> reference somewhere defining the idea). So instead of mapping
>> constants to constants, in proto I'd be mapping arity-0 terms to
>> arity-0 terms; however, the contents of those target arity-0 terms
>> would be the primitive subalgebra terms; hence, there would be a
>> "down" to get at the subalgebra term (in this case, probably
>> implemented as mpl::set<GrammarTerminalNumeral,TermNum>).

> I think now you lost me. I was thinking the "first" set would be a
> runtime data structure.

Yes, for cfg_lookahead_extends, but no for gram_lookahead. In
gram_lookahead, these sets (one for each node in the expression tree),
are calculated at compile time. Yeah, I know, it sounds way more
complicated than any compiler can handle. However, as mentioned above
(see morphisms.cpp), for very simple grammars, at least the empty
attribute can be compile-time calculated.

> But I guess I'm wrong. So let's back up. What does the "first" set
> represent?

The set of terminal symbols that can start a string in the language
(or "sublanguage") described by a grammar expression. That's
  essentially repeated in:


>> Your probably wondering why I'm so fixated on using the algebra
>> concepts and I'm beginning to wonder myself ;) . However, my
>> original intent was to try and be more formal and possibly gain
>> some insight about how to do it other ways.

> I encourage you to formalize what you're doing with proto. If proto
> obeys some algebra, or can be used to implement a useful one, I
> think that would be very interesting. I might even want to write a
> paper about it with you. :-)

I've got some other ideas. For example, just as a grammar can be
thought of as a set of equations expressible in proto ( as done by
cfg_lookahead_extends) the equations describing the connections in a
graph can be expressed if operator * can be somehow restricted to just
two elements (i.e. a*b allowed but a*b*c is not), and where the 1st
element in the factor pair is some literal and the second element
represents an unknown. This could then be used to solve any number of
path problems like the shortest path problem, and even DFA generation

This is where compile-time solution might also be important. After
all, solving the following at compile-time:

      X = A * X + b

where X and b are vectors and A is a matrix could be useful ;)

I'm also wondering what's the relation between a proto grammar whose
top level connective is bitwise_or and boost variant's recursive
variant types:


>> > Well, that's one approach. In fact, you can approximate that in
>> > proto today by wrapping a proto expression in a template,
>> > effectively hiding its proto-ness from proto, which would then
>> > treat it like any other terminal. You would be responsible for
>> > writing the transforms and/or contexts that knew about these
>> > special terminals and recursed into them.

OOPS. After thinking about it some more and writing the above:

   (one for each node in the expression tree),

I realized restricting the lookahead attributes to the "wrappee's " of
"wrapped" proto terminals won't work because the proto composite
expressions (e.g. those with proto::tag::bitwise_or) also must have
lookahead attributes.

Back to drawing board :(

Boost list run by bdawes at, gregod at, cpdaniel at, john at