Boost logo

Boost Users :

From: Andy Little (andy_at_[hidden])
Date: 2006-03-12 04:39:18


"Carlo Wood" wrote
> Parsing C++ is only possible (unambigiously) after
> preprocessing it first, and when at any moment a full
> list of every identifier is known: the parser needs
> to know which types have been declared, which variables
> exist etc, in the scope that it is parsing at that
> moment. As a result, any C++ parser has to be almost
> a compiler before it can work.

When I attempted this I didnt looking up names a major problem. The whole
program was modelled as a tree. At the root was an abstract base class called
Scope (which would be realised as namespaces classes etc) each derived class
having separate member symbol-tables for each type of entity that they could
contain. A Scope declaration looked like below and looking back at it it seems
its main job was returning names and turning them into particular types of
entities. The h_str type there (for example in the Find function) is basically
a handle to a string. Whenever the lexer found a name it would immediately
request an h_str
for it. The idea here was simply to use integer ids for speed rather than
passing strings around
directly. It was only occasionally necessary to turn the id back into a string
for users benefit

The E_ref class returned a pointer to a something with information as to what
class of entity it actually contained. If an attempt was made to convert the
entity to the wrong type it would throw an exception.

Overall looking back on the work I did I think I would take the same approach
again. I have no doubt the result would be very slow but importantly it was
relatively easy to understand and work on. Maybe I should put the whole work in
the vault though some of it is a little cringemaking for boost and I guess it
has little to do with Spirit though ...

As I said before the main issue with writing a C++ parser is that you need to
know the language really well, otherwise you get into the position as I did of
not only trying to figure out how to write a parser which is hard enough but
also learning the higher echelons of C++ at the same time.

I also found it easier to write a recursive descent parser directly by hand than
trying to automate it, because it was easier to wiggle the code that way. IIRC
Bjarne Stroustrup says something similar with regard to CFront

regards
Andy Little

// various entities which can be members of a scope, some are scopes some arent

  class Class;
 class Union;
 class Enum;
 class Namespace;
 class Typedef;
 class ClassTemplate;
 class FncLst;
    class Object;

//scope abstract base class
class Scope {
  Scope* parent;
 protected:
  virtual ~Scope(){}
  Scope(Scope* Parent):parent(Parent){}
 public:
  Scope* getParent()const{return parent;}

  struct E_ref{
  public:
   enum Entity{
    NOTFOUND,OBJECT,
    CLASS,UNION,
    ENUM,FNC_LST,
    TYPEDEF,
    NAMESPACE,
    CLASS_TEMPLATE
   };
  private:
   Entity entity;
   union{
    void* m_notfound;
    Object* m_object;
    Class* m_class;
    Union* m_union;
    Enum* m_enum;
    Typedef* m_typedef;
    FncLst* m_fncLst;
    Namespace* m_namespace;
    ClassTemplate* m_classTemplate;
   };
   void Assert(Entity e){if( entity != e)throw BadE_ref();}
   void chk_null_ptr(){if (!m_notfound) entity = NOTFOUND;}
  public:
   bool operator==(Entity e)const{return entity == e;}
   bool operator!=(Entity e)const{return entity != e;}
   Entity operator()()const{return entity;}
   operator Class&() {Assert(CLASS);return *m_class;}
   operator Union&() {Assert(UNION);return *m_union;}
   operator Enum&() {Assert(ENUM);return *m_enum;}
   operator Namespace&(){Assert(NAMESPACE);return *m_namespace;}
   operator Objects::Object&(){Assert(OBJECT);return *m_object;}
   operator Typedef&(){Assert(TYPEDEF);return *m_typedef;}
   operator FncLst&(){Assert(FNC_LST);return *m_fncLst;}
   operator ClassTemplate&(){Assert(CLASS_TEMPLATE);return *m_classTemplate;}

   E_ref(): entity(NOTFOUND),m_notfound(0){}
   E_ref(Object& ob):entity(OBJECT),m_object(&ob){}
   E_ref(Class& c): entity(CLASS),m_class(&c){}
   E_ref(Union& u): entity(UNION),m_union(&u){}
   E_ref(Enum& e): entity(ENUM),m_enum(&e){}
   E_ref(FncLst& fl):entity(FNC_LST),m_fncLst(&fl){}
   E_ref(Namespace& n): entity(NAMESPACE),m_namespace(&n){}
   E_ref(ClassTemplate& ct): entity(CLASS_TEMPLATE),m_classTemplate(&ct){}
   E_ref(Typedef& t):entity(TYPEDEF),m_typedef(&t){}
  };

  virtual E_ref Find(h_str name)const=0;
  virtual E_ref FindMember(h_str name)const=0;
  virtual E_ref FindType(h_str name)const=0;
  virtual E_ref FindMemberType(h_str name)const=0;

  Namespace& FindNearestEnclosingNamespace()const;
  Scope& FindNearestNonClassNonProto()const;
  virtual bool AddForwardDecl(
   Token::ClassKey::Type t,
   h_str identifier,Input::TokenStream& tstream)=0;
  bool hasParent(const Scope*)const;
 };


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net