Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2001-05-25 06:30:29


Since there seem to be interest in automata building library, I suggest some
more detailed discussion.

I thinks that the more fundamential classes should be created first: BNF,
automaton, probably NFA/DFA. Those have foundation in computer science, and
hopefully will not depend on personal preferences as much as high level
classes. All the other suggestions can be evaluated later -- e.g. making
Acceptor interface for existing DFA is better than making Acceptor interface
primary.

Now to the point! I have implementation for LALR(1) and SLR(1), which at
least works, but prefer to discuss interface first. Here's first attempt:

namespace Grammar {
        // This represent symbol from the high point of view -- essentially the name
        class Symbol {
        public:
                std::string name;
                virtual bool is_terminal() const = 0;
                bool is_nontermonal() const;

                bool operator==(const Symbol&) const;
                bool operator!=(const Symbol&) const;
                bool operator<(const Symbol&) const;
        };
        // In real life user will want to add some more information to Symbol,
        // Terminal and Nonterminal -- subclassing will suffice, I think.
        class Terminal : public Symbol {};
        class Nonterminal : public Symbol {};

        class BNF {
        public:
                class Rule {
                public:
                        // Q which ctors needed?
                        Symbol* left;
                        std::vector<Symbol*> right;
                };
                        
                std::vector<Rule> rules;
                std::vector<Symbol*> terminals, nonterminals;
                Symbol* grammar_start;

                // Is it good to expose the variables above?
                // If not, what is the best set of methods
                // probably
                // void add(auto_ptr<Terminal>)
                // void add(auto_ptr<Nonterminal>)
                // void add(const Rule&);
        };
}

namespace LR {
        // Represent parser/automaton view of symbol -- essentially a number
        class Symbol {
                bool is_terminal() const;
                int terminal_number() const;
                bool is_nonterminal() const;
                int nonterminal_number() const;
                bool is_eof() const; // ?
                bool is_error() const; // ?

                // kind of named ctor
                static Symbol for_terminal(int t);
        };
        // Or another kind of named ctor
        // class Terminal : public Symbol {};

        class LR_automaton {
        public:
                
                LR_Action action(int state, const Symbol& s) const;
                int goto(int state, const Symbol& s) const;

                int rule_length(int rule) const;
        };

        void build_slr1(const Grammar::BNF&, LR_automaton&);
        void build_lalr1(const Grammar::BNF&, LR_automaton&);
        // What interface for reporting conflicts, priorities &c. ?
        

        template<class Scanner>
        class LR_parser {
        public:
                LR_parser(const LR_automaton&, Scanner&);
                bool parse();
        
                virtual void reduction(int rule);
        };
        // How to implement semantic values? Make it another template parameter and
        // change reduction to
        // SemanticValue* reduction(int rule, Iterator begin, Iterator end)?
}

Hope for futher disccussion and suggestions.

-- 
Regards,
Vladimir

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk