Boost logo

Boost :

From: Chris Lattner (clattner_at_[hidden])
Date: 2007-09-01 16:02:59


>> No, you're technically correct. Some semantic analysis is certainly
>> required to parse C++, so you can't completely drop semantic analysis
>> and still parse.
>
> Isn't "some" a huge understatement? I mean, c'mon, you need to do
> overload resolution! Just evaluate
> boost::detail::is_incrementable<X>::value for some X, for example.

C++ is clearly more complicated than C. The minimal amount of
semantic processing for C++ will probably include scoping, namespace,
class and function processing (where in C you just need to track
typedefs + scoping). However, you don't need to track function
bodies and a lot of other things if you don't want to.

>> and the
>> parser will certainly need to call into the semantic analysis module
>> to figure out whether a particular name is a type, a value, a
>> template, etc... just like a C parser needs to consult a symbol
>> table to figure out whether a name is a typedef name or something
>> else.
>
> Yeah, only more so. At one point he said of the parser, "we don't do
> constant folding," but clearly you need to do that to decide whether a
> name is a type or not.
>
> foo<3*5>::x * y;
>
> It seems to me that for C++ with templates, during parsing you have to
> all the semantic analysis that isn't code generation -- and that's a
> lot.

No one is debating that you have to track the right things. For
integer constant expressions, you can clearly track the value as you
parse, regardless of whether you are building an AST or not. You do
have to do minimal amounts of semantic analysis to do this. In C,
the closest example are things like "case 1+4/(someenumval):". In
the AST, we actually do represent the fully expanded form (which is
useful for some clients of the AST) and compute the i-c-e value on
demand.

As Doug mentioned, the most important point of the design space we
are in is to keep the syntax and semantics partitioned from each
other. This makes it easier to understand either of the two and
enforces a clear and well-defined interface boundary between the
two. Having both a minimal semantics implementation and a full AST-
building semantics analysis module is more useful as verification
that the interfaces are correct than anything else.

>> What this probably means is that the "minimal" semantic analysis for
>> C++ is a whole lot more heavyweight than the minimal semantic
>> analysis for C. But you still get some benefit from separating out
>> the semantics from the parser, because there are many semantic bits
>> that you *can* ignore if you only want an (unchecked) parse tree.
>
> What, other than code generation?
>
> It has often seemed to me that it might make sense to parse C++
> nondeterministically, just to avoid some of these issues. The number
> of real instances of ambiguity is probably pretty small.

Without more context I'm not sure what you mean by non-
deterministic. Obviously (hopefully) all parsers are
deterministic :). There are at least three different ways of parsing
C++ fuzzily:

1. Use a doxygen-style "fuzzy" parser with a set of heuristics. This
is needed if you want to try to parse files without processing
headers, but has obvious significant limitations.
2. Parse and track the "minimal" set of semantic information to parse
correctly. This gives you a correct parse tree, and reduces the
amount of semantic information you need to keep around (memory use is
lower than a full sema implementation), but for C++, you end up doing
a lot of stuff anyway.
3. Parse a superset of the language and either resolve the ambiguity
later or not. The problem with this is that it is significantly less
efficient both in time and space than using semantic information to
direct the parser. However, it can provide a nice separation between
the parser and semantic analyzer.

-Chris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk