Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2001-05-29 05:41:31


Douglas Gregor wrote:

>> I thinks that the more fundamential classes should be created first: BNF,
>> automaton, probably NFA/DFA.
>> [snip]
>> making
>> Acceptor interface for existing DFA is better than making Acceptor
>> interface primary.

>This depends on what one considers the primary purpose of an automaton. If
>the primary purpose is to take a string an either "accept" or "reject" it,
>then the primary interface we want to define is based on accept/reject. If
we
>consider the primary purpose to be something else, then the primary
interface
>is built based on that idea.

Another possible purpose of an automaton is to perform transitions from state
to state (and possibly output something in the process). Actually, the word
"primary" that I have used should be dropped -- it's reasonable to have both
Acceptor interface and transition interface available to the world.

>> Now to the point! I have implementation for LALR(1) and SLR(1), which at
>> least works, but prefer to discuss interface first. Here's first attempt:

>We probably don't want the user to have to subclass Terminal or Nonterminal
>merely for classification purposes - it would get in the way of allowing
>arbitrary types (i.e., char or std::string) as terminals. A trait and/or
>policy class would likely suffice.

Do we really need arbitrary types for terminals here? I indend
Grammar::Symbol to be used only for purpose of describing grammar. Can you
explain your opinion in more detail?

>I don't see that the BNF class even needs to exist: the Rule class is
enough.
>When you want to generate an automaton for a given grammar, give the
>generator the Rule instance for the start variable.

Your observation is correct. However, there are some issues:
1. Grammar seems to be concept deserving a class of its own: BNF and EBNF
closely correspond what people think of a grammar, in my opinion.
2. Elimitating BNF would require that each nonterminal know all the rules
for it. This will result in a net of pointers, and no single entity
responsible for deleting all that (or responsible for all that in general).
3. Efficient parsing requires terminals to be numbered. If BNF is gone, where
will that information go?

On the other hand, ability to specify arbitrary nonterminal as start one is
desirable (e.g. parser for C++ expressions only). I think I should add it.

>LR_parser class: by taking a scanner, the scope of the LR parser is limited
>only to that type of scanner. However, an LR parser is more abstract than
>that: it can take any sequence of symbols and parse it. I think the
>mechanism
>for getting that sequence of symbols should be as abstract as possible (and
>not bound too early!) so that composing several transformations doesn't
>require the parser to change type. The parser need not know where its input
>comes from.

Do you mean that Parser should take instance of some abstract scanner? You
might be right --- I actually haven't thought a lot about it. Anyway, since
the changes will be trivial, let's return to this question later.

Joel de Guzman wrote:

>Just a quick comment. Can we do this using compile time
>polymorphism vs. runtime polymorphism? This way we
>can have more aggressive optimisations.

If I understood Douglas correctly, he proposes just the opposite :-).
I really don't think that runtime polymorphism will cost much -- it is not
used in the most critical parts.

>PS>A quick and dirty draft implementation might be useful.

The quick and dirty implementation is written. Take a look at
http://chronos.cs.msu.su/~ghost/MyDocuments/projects/lr_library/index.html
Some criticism will be usefull.

-- 
Regards,
Vladimir

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk