Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-06-27 19:40:44


I'm terrible at algebra, but I'll do my best. (Extensive quoting
follows, because I'll have trouble following this discussion otherwise.)

Larry Evans wrote:
> The problem involves my attempt to model the lookahead problem
> as an algebra problem whose solution involved the following steps.
>
> 1) For a grammar expression, e_gram, (a "source" algebra expression)
> perform a homomorphism, gram2X. to a corresponding target
> algebra, X, expression, e_X, where X is one of {empty,first,follow}.
>
> For example,
>
> gram2empty(inp | epsilon) =
> not_empty & yes_empty
>
> where inp is any element in the terminals of the grammar and
> epsilon is, of course, the epsilon expression representing the
> empty string.
>
> 2) Perform a sequence of reductions, based on "laws" of the target
> algebra, until a "constant" in the target algebra is reached.
>
> For example, some of the "laws" of the empty algebra are:
>
> not_empty & e_empty -reduce-> not_empty
> e_empty & not_empty -reduce-> not_empty
>
> where e_empty means any expression in the empty algebra. Based
> on this, and the example expression in 1):
>
> not_empty & yes_empty -reduce-> not_empty
>
> And not_empty is a "constant" in the empty algebra; do, the
> sequence of reductions is complete.

I get this. Is this part purely compile time, or are some of these
reductions dependent on runtime information? If it's purely compile
time, this example is fairly close to what you've just described:

http://tinyurl.com/253esm
(http://boost-sandbox.sourceforge.net/libs/proto/doc/html/boost_proto/users_guide/examples/rgb.html)

> Now, IIUC, a homomorphism maps functions with arity, n, in the source
> to functions with arity n in the target. Thus | in the gram algebra
> was mapped to & in empty algebra. If n = 0, this means constants in
> the source are mapped to cconstants in the target. For example, in
> the above example, inp_0, a constant in gram algebra was mapped to
> not_empty, a constant in the empty algebra. However, for gram2first,
> things aren't that simple. The constants in the first algebra are
> *sets* of constants in the gram algebra. More concretely:
>
> 1) homomorphism: gram2first
>
> 1.1) constants:
>
> gram2first(inp_0) = {inp_0}
> gram2first(inp_1) = {inp_1}
> ...
> gram2first(inp_n) = {inp_n}
>
> where n is the number of terminal symbols in the grammar.
>
> 1.2) binary |:
>
> gram2first(|) = \/
> where \/ is set union operator or function.
>
> gram2first(e_gram_0 | e_gram_1)
> = gram2first(e_gram_0) \/ gram2first(e_gram_0)
>
> However, {inp_0} and {inp_1} and ... {inp_n} are not really a
> constants in the first grammar. Instead they are really
> composite terms or expressions in a multisorted set algebra with the
> single constant, empty_set, and binary function, insert:
>
> insert(empty_set,inp_0)
> insert(empty_set,inp_1)
> ...
> insert(empty_set,inp_n)
>
> and this is what lead me to look for something like the hierarchical
> algebra, which, I guess, is similar to the "hierarchical abstract
> type" in (6) here:
>
> http://tinyurl.com/ypekyd
>
> The type, P, in the above tinyurl corresponds to the multisorted set
> algebra mentioned above, and insert(empty_set,inp_0) would be an
> example of the "primitive term" mentioned in the tinyurl.

Ooof, I followed that link and I think I hurt myself. But I perhaps get
the gist, which is that "first" for a bunch of alternates is a set union
of the "first" sets of each alternate. "Input" here is what? A
character? A string? A pattern? I suppose in the abstract it could be
anything. I don't know what a "mustisorted set algebra" is, but it
sounds like complication. Seems fairly simple .. you have a transform
that builds a set by folding together the "first" sets of the
alternates, and you recurse. Terminals yield a "first" set with one
element. proto::transform::fold_tree would be handy for this, and you
pass around the set you're building as a visitor parameter to the transform.

> > In gram_lookahead.zip, the morphisms
> >> cannot proceed down the expression tree after a node with arity==0 is
> >> reached.
> >
> >
> > Huh, there is nothing "down" after a node with arity 0. That's all you
> > get. :-)
> >
>
> The above definition of homomorphism mapped constants (functions with
> arith=0) to constants. However, in the case of gram2first, the
> mapping was from constants to some term in the primitive subalgebra of
> the target algebra (I don't even know if that's been defined
> somewhere, but I'd be very surprised if there were'nt some reference
> somewhere defining the idea). So instead of mapping constants to
> constants, in proto I'd be mapping arity-0 terms to arity-0 terms;
> however, the contents of those target arity-0 terms would be the
> primitive subalgebra terms; hence, there would be a "down" to get at
> the subalgebra term (in this case, probably implemented as
> mpl::set<GrammarTerminalNumeral,TermNum>).

I think now you lost me. I was thinking the "first" set would be a
runtime data structure. But I guess I'm wrong. So let's back up. What
does the "first" set represent?

> > However, with the first and follow algebra, the terminal
> >> values are not single values like true or false, but instead multiple
> >> values like {ident,value,lpar} (e.g. the first attribute of factor
> >> nonterminal in arith_expression grammar).
> >
> >
> > OK, so your terminal is a 3-tuple. It's still a terminal.
> >
>
> Yes, but also a composite term in the primitive subalgebra. Your
> probably wondering why I'm so fixated on using the algebra concepts
> and I'm beginning to wonder myself ;) . However, my original intent
> was to try and be more formal and possibly gain some insight about how
> to do it other ways.

I encourage you to formalize what you're doing with proto. If proto
obeys some algebra, or can be used to implement a useful one, I think
that would be very interesting. I might even want to write a paper about
it with you. :-)

> > So, the reduction of values
> >> in this grammar must continue through the value contained in the proto
> >> terminal (which is an instance of mpl::set on grammar terminals, like
> >> ident, value, or lpar).
> >
> >
> > I don't think you need proto to "recurse" into your terminals. I just
> > think you need to write a custom transform that does something
> > appropriate with your terminals.
> >
>
> OK, you're probably right and the attempt at formality is not worth
> the effort.

See above.

> > This lead me to search for something in
> >> algebra literature to handle this, and that lead to something called
> >> "heirarchical albebras" where the arity-0 expressions are actually
> >> values in some other algebra (if I'm interpresting things correctly).
> >> This sounds like what I was looking for. I would be interesting if
> >> the transforms in proto actually were more like heirarchical algebra
> >> morphisms. Just something to think about (probably in the distant
> >> future ;) ).
> >
> >
> > Well, that's one approach. In fact, you can approximate that in proto
> > today by wrapping a proto expression in a template, effectively hiding
> > its proto-ness from proto, which would then treat it like any other
> > terminal. You would be responsible for writing the transforms and/or
> > contexts that knew about these special terminals and recursed into them.
> >
> >
>
> OK, now you've inspired me to continue trying.

Please do.

> Thanks for the feedback!

My pleasure.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk