Boost logo

Boost :

From: Douglas Gregor (gregod_at_[hidden])
Date: 2001-05-29 17:54:18


On Tuesday 29 May 2001 06:41 am, you wrote:
> Douglas Gregor wrote:
> >> I thinks that the more fundamential classes should be created first:
> >> BNF, automaton, probably NFA/DFA.
> >> [snip]
> >> making
> >> Acceptor interface for existing DFA is better than making Acceptor
> >> interface primary.
> >
> >This depends on what one considers the primary purpose of an automaton. If
> >the primary purpose is to take a string an either "accept" or "reject" it,
> >then the primary interface we want to define is based on accept/reject. If
>
> we
>
> >consider the primary purpose to be something else, then the primary
>
> interface
>
> >is built based on that idea.
>
> Another possible purpose of an automaton is to perform transitions from
> state to state (and possibly output something in the process). Actually,
> the word "primary" that I have used should be dropped -- it's reasonable to
> have both Acceptor interface and transition interface available to the
> world.
>
> >> Now to the point! I have implementation for LALR(1) and SLR(1), which at
> >> least works, but prefer to discuss interface first. Here's first
> >> attempt:
> >
> >We probably don't want the user to have to subclass Terminal or
> > Nonterminal merely for classification purposes - it would get in the way
> > of allowing arbitrary types (i.e., char or std::string) as terminals. A
> > trait and/or policy class would likely suffice.
>
> Do we really need arbitrary types for terminals here? I indend
> Grammar::Symbol to be used only for purpose of describing grammar. Can you
> explain your opinion in more detail?

I should be more careful when I say 'arbitrary'. I'm not saying that within
any given grammar we must allow more than one terminal type. I'm saying that
we can't require much of the terminal type: the user gives us a terminal type
when we create the rule, and we use it without requiring it to be derived
from any special class. So we can have:

Rule<char> A = 'a' | 'b';
Rule<std::string> B = 'foo' | 'bar';

> >I don't see that the BNF class even needs to exist: the Rule class is
>
> enough.
>
> >When you want to generate an automaton for a given grammar, give the
> >generator the Rule instance for the start variable.
>
> Your observation is correct. However, there are some issues:
> 1. Grammar seems to be concept deserving a class of its own: BNF and EBNF
> closely correspond what people think of a grammar, in my opinion.

It is possible to consider a grammar as simply a grammar induced by a given
start symbol. I am most interested in this definition (for the reason you
mention below - reusing parts of parsers) because it seems most pure.
I think the same thing of a tree. A 'tree' structure makes little sense to me
in that a 'tree' is merely the tree induced by a given root node. Any
recursive data structure can be thought of in this way, and in my opinion it
is more clean to think of a nonterminal/tree node/list node/etc in isolation
instead of bringing in the larger context of a grammar.

> 2. Elimitating BNF would require that each nonterminal know all the rules
> for it. This will result in a net of pointers, and no single entity
> responsible for deleting all that (or responsible for all that in
> general).

This net of pointers will have to be somewhere regardless. Why not in the
place most local to its usage?

>3. Efficient parsing requires terminals to be numbered. If BNF is
> gone, where will that information go?

Well, yes, there's that whole efficiency thing. We could allocate a few extra
bytes of space in each nonterminal and fill it with an index number when we
need it. The preprocessing step to do this would likely not cost much
compared to the cost of generating a parser.

> On the other hand, ability to specify arbitrary nonterminal as start one is
> desirable (e.g. parser for C++ expressions only). I think I should add it.
> >LR_parser class: by taking a scanner, the scope of the LR parser is
> > limited only to that type of scanner. However, an LR parser is more
> > abstract than that: it can take any sequence of symbols and parse it. I
> > think the mechanism
> >for getting that sequence of symbols should be as abstract as possible
> > (and not bound too early!) so that composing several transformations
> > doesn't require the parser to change type. The parser need not know where
> > its input comes from.
>
> Do you mean that Parser should take instance of some abstract scanner? You
> might be right --- I actually haven't thought a lot about it. Anyway, since
> the changes will be trivial, let's return to this question later.

I mean that the parser should never know about the scanner. Instead, tokens
should be pushed into the parser from some source(s). Again, I return to the
Acceptor interface where the parser knows nothing about where its input comes
from, but merely takes tokens via an acceptor.

> Joel de Guzman wrote:
> >Just a quick comment. Can we do this using compile time
> >polymorphism vs. runtime polymorphism? This way we
> >can have more aggressive optimisations.
>
> If I understood Douglas correctly, he proposes just the opposite :-).
> I really don't think that runtime polymorphism will cost much -- it is not
> used in the most critical parts.

Actually, I'm agreeing with Joel on this :). We'd be better off with static
(compile time) polymorphism as far as we can take it. However, it would be
nice if the user could turn it into runtime polymorphism. For instance, an
interactive grammar-construction program that would allow, for instance, a
user to redefine a terminal in an existing grammar as a nonterminal to slowly
grow a grammar. I haven't thought about this enough yet to formalize how it
would work (sorry).

        Doug


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk