Boost logo

Boost :

From: Larry Evans (jcampbell3_at_[hidden])
Date: 2001-05-29 23:52:28


Douglas Gregor wrote:

> On Tuesday 29 May 2001 06:41 am, you wrote:
> > Douglas Gregor wrote:
> > >> I thinks that the more fundamential classes should be created first:
> > >> BNF, automaton, probably NFA/DFA.
> > >> [snip]
> > >> making
> > >> Acceptor interface for existing DFA is better than making Acceptor
> > >> interface primary.
> > >
> > >This depends on what one considers the primary purpose of an automaton. If
> > >the primary purpose is to take a string an either "accept" or "reject" it,
> > >then the primary interface we want to define is based on accept/reject. If

http://groups.yahoo.com/group/boost/message/12270 has an attachment
containing a pccts grammar with terminals TOK_SLC and TOK_PRED. As
explained in that post, I'd like the boost automata builder which
allows easy substitution of another language for these terminals, more
specifically, I'd like a "predicate expression" language to be
substituted for TOK_PRED and a "straight-line-code" language to be
substituted for TOK_SLC. This is an example of "substitution" as
explained in theorum 6.2 of _Introduction to Automata Theory,
Languages, and Computation_, by Hopcroft and Ullman, Addison-Wesley
(1979). In the following definitions make the subsequent quote
clearer:

  CFL means the set of context free languages
  Term means some terminal vocabulary
  variable means non-terminal
  <= means is subset of
  G means some grammar
  L(G) means the language defined by grammar G

The following is the proof of theorum 6.2 quoted from the book:

  Let L be a CFL, L <= Term*, and for each a in Term, let L[a] be a
  CFL. Let L be L(G) and for each a in Term let L[a] be L(G[a]).
  Without loss of generality assume that the variables of G and the
  G[a]'s are disjoint. Construct a grammar G' as follows. The
  variables of G' are the variables of G and the G[a]'s; the terminals
  of G' are the terminals of the G[a]'s. The start symbol of G' is
  the start symbol of G. The productions of G' are all the
  productions of the G[a]'s together with those productions formed by
  taking a production A-> alpha of G and substituting S[a], the start
  symbol of G[a] for each instance of an a in Term appearing in alpha.

My first guess on how to do this is implemented is that each grammar
class would look something like:

  template<typename Terminals>
  struct Lgrammar
    : public Terminals
    { enum VariableIndex
      { Start
      , V1
      , V2
      ...
      , End
      };
      Variable vars[End];
      Lgrammar(void)
        { Terminals& terms = Terminals::vars
        ; vars[Start]
          = vars[V1]
          | vars[V3]
          | terms[T1]
        ; vars[V1]
          = vars[V5]
>> vars[V20]
>> terms[T1]
          | terms[T6]
        ;}
    };

Obviously using an index into a vector is more wordy than the Spirit
examples we've seen so far, and it also means that whatever a
Variables is, it must be a pointer or smart pointer.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk