Boost logo

Boost :

From: Douglas Gregor (gregod_at_[hidden])
Date: 2001-05-24 09:20:14


On Thursday 24 May 2001 08:32, you wrote:
> Vladimir Prus wrote:
> > Hello.
> > Discussion about Spirit touched on possibility of creating automata
> > building library. I view such library as providing very simple
> > functionality: given a grammar (=BNF), build automaton for recognition of
> > the grammar.
> >
> > Futher direction may include EBNF->BNF conversion and probably NFA & DFA
> > building and DFA minimization facilities.
> >
> > I myslef can implement LALR(1) and SLR(1) methods, probably LR(1)
> > (if needed).
> >
> > As for LL(?) methods, I'm not sure if these have much in common with
> > LR(?).
> >
> > Is there any interest for such a library?
>
> I'd be interested, especially in one that allowed inheritance of grammars,
> as in antlr, and allowing the lex part of the parser being another parser.
> I've currently got some smalltalk around somewhere that did analysis of
> grammers to rm left recursion an some other ops on grammars. So, yes I'm
> interested.

One interface I had considered for automata was an "Acceptor" interface. The
Acceptor concept refines the Output Iterator concept. You output symbols
until you are finished, and then output an end-of-stream symbol. Then there
is an "accepted" function that specifies whether or not the input string was
in the language.

Sample usage:

DFA::acceptor a = myDFA.start();
copy(istream_iterator(cin), istream_iterator(), a);

if (accepted(a)) { cout << "accepted" << endl; }

Automata that transform a string of tokens into another string of tokens
(i.e., a lexer) could be created as an output iterator adaptor:

HighParser::acceptor h = myHighParser.start();
TokenParser::acceptor t = tokenParser.start();
Lexer::acceptor l = lexer.start();

copy(istream_iterator(cin), istream_iterator(), l(t(h)));
finalize(l); // Send the end-of-stream token through

if (h) { cout << "accepted" << endl; }

The idea is that any of these transforming acceptors are a function object
that takes another acceptor and returns an acceptor that emits tokens to the
acceptor it contains. So whenever "l" matches a token it outputs that token
to "t" and, likewise, when "t" matches a token it outputs it to "h".

        Doug


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk