From: Douglas Gregor (doug.gregor_at_[hidden])
Date: 2007-09-01 16:55:32
On Sat, 2007-09-01 at 13:02 -0700, Chris Lattner wrote:
> >> No, you're technically correct. Some semantic analysis is certainly
> >> required to parse C++, so you can't completely drop semantic analysis
> >> and still parse.
> > Isn't "some" a huge understatement? I mean, c'mon, you need to do
> > overload resolution! Just evaluate
> > boost::detail::is_incrementable<X>::value for some X, for example.
> C++ is clearly more complicated than C. The minimal amount of
> semantic processing for C++ will probably include scoping, namespace,
> class and function processing (where in C you just need to track
> typedefs + scoping). However, you don't need to track function
> bodies and a lot of other things if you don't want to.
As Dave noted, it also includes template instantiation and overload
resolution. It's a phenomenal amount of work to write a full C++ parser,
because you need nearly everything that a compiler needs.
Once you have that, "minimal" semantic analysis can still be very
useful. That minimal analysis still includes most of the capabilities of
a compiler (yes, template instantiation and overloading have to be there
to be 100% correct), but it can still avoid instantiations of function
templates, instantiations of class templates without specializations,
code generation, and much of the other semantic analysis tasks. So while
an AST-producing C++ parser won't have much less code than a full C++
compiler, it will execute far less of that code. You need template
instantiation and overload resolution, but only in very limited cases.
> As Doug mentioned, the most important point of the design space we
> are in is to keep the syntax and semantics partitioned from each
> other. This makes it easier to understand either of the two and
> enforces a clear and well-defined interface boundary between the
> two. Having both a minimal semantics implementation and a full AST-
> building semantics analysis module is more useful as verification
> that the interfaces are correct than anything else.
It's also extremely useful for anyone who wants to manipulate the ASTs.
The reason GCC is so darned hard to work with (aside from the crusty C
code and ambiguous data structures) is that there is no separate API for
manipulating the AST. The parsing is intertwined with the semantic
analysis, so if you want to go through and build a new tree *without*
parsing code for that tree, things can get ugly.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk