Boost logo

Boost :

From: Larry Evans (cppljevans_at_[hidden])
Date: 2004-12-26 18:04:28


On 12/25/2004 11:43 PM, Joel de Guzman wrote:
[snip]
> Long story. Anyway, the basic problem is that Spirit parsers
> are dynamic. The computation of the first set, for instance,
> cannot be done at compile time. Say for example:
>
> r = str_p("apple") | "banana" | "orange";
>
> the first set of r is: ['a', 'b', 'o']. This can be obtained
> through a call to first(r) which calls first("apple"),
> first("banana") and first("orange") recursively.
>
> The problem is that each alternative can potentially have
> different types. A production such as:

Could you explain what this means. I'm guessing it means
each alternate would have a different synthesized attribut
calculated for it.

>
> a | b | c;

That is, a's attribute, A, could be a float,
b's attribute, B, could be a string, and
c's attribute, C, could be something else.
Is that right?

>
> can be thought of as a
>
> composite<alternative_operation, tuple<A, B, C> >

Wouldn't the most "natural" attribute for this be a
"named" variant which map contain the same type for
two or more different "names"? IOW, instead of the
variant currently in boost, the sum type in:

   sandbox/boost/indexed_types/sum.hpp

would represent the result of parsing the alternative expression:

    a | b | c

and the names could be 1,2,3 (all unsigned) or possibly names,
such as elements of an enumeration:

   enum alt_abc{ alt_a, alt_b, alt_c};

>
> where A, B and C are the types of a, b and c. So, now, when
> creating the parser, you'll have to make a table of alternates
> tuple<A, B, C> which dispatches properly based on information
> known only at runtime: 1) the current input character and

Of course, I understand how this must be only known at runtime.

> 2) the first follow sets of A, B and C.
>
Now this could be calculated at compile time if there were no
cycles in the productions. For example, the simplest case would
be:

     A = 'a' A | ''

i.e. a sequence of a's. The cycle, or recursion, is caused by
the A on the rhs. Of course, we know the solution in this
simple case, but other's would, I guess, couldn't be solved or
would be too difficult to solve. (I'm really guessing just based
on my inability to figure out how). The point is, some could be
while other's might not be; hence, a more accurate statement
would be:

   2) some of the first and follow sets of A, B, and C could not be
   calculated at compile time or would be "too difficult" to
   calculate at compile time.

Regards,
Larry


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk