Boost logo

Boost :

Subject: Re: [boost] What about a Spirit-powered C++ syntax analysis library in Boost?
From: Doug Gregor (doug.gregor_at_[hidden])
Date: 2010-09-08 17:19:51


On Wed, Sep 8, 2010 at 8:07 AM, Florian Goujeon
<florian.goujeon_at_[hidden]> wrote:
> I've written a C++ syntax analysis library using Boost.Spirit.
> (This 'library' is actually a subset of the Scalpel library. I talked
> about it in the Boost mailing list here:
> http://article.gmane.org/gmane.comp.lib.boost.devel/208217 )
> For the sake of brevity, let's call it Salsa (for Stand-ALone Syntax
> Analysis).
>
> While most C++ compilers need semantic information to perform the
> syntax analysis, Salsa is a standalone syntax analyzer. Its Spirit
> grammar doesn't run any semantic action.
> Consequently, you can use it to parse some C++ code without having to
> analyze a whole translation unit (i.e. without processing #include
> directives).
>
> At this point, you may wonder how syntax ambiguities are managed.
> In most cases, there's always an interpretation which is more obvious
> than the other one(s).
> In all cases, you may reasonably ask the programmer to disambiguate
> its code.
> Whatever the case, Salsa (predictably) chooses one of the
> interpretations.
> Here are some examples:
>
> The following statement…:
>    a * b;
> … may be either a multiplication or a pointer declaration.
> The default interpretation is the pointer declaration. You can
> reasonably ask the programmer to disambiguate the code by putting
> parenthesis if he wants the syntax analyzer to interpret it as the
> former:
>    (a * b);

I really didn't want to get into this, but you asked me to weight in, so...

You cannot reasonably ask the programmer to disambiguate the code for
you, especially when existing tools handle the code just fine. "Change
you code, then you can try out my tool" is the fastest way to kill off
any chance of large-scale adoption.

> Trickier. In the following declaration…:
>        bool bool_ = a<  b || c>  (d&&  e);
> … the right-hand side expression may be either a boolean expression
> (where 'a', 'b', 'c', 'd' and 'e' are variables of type bool) or a
> function template call (whose name is 'a', which takes one bool
> template parameter and where 'b', 'c', 'd' and 'e' are all variables
> of type const bool).
> The default interpretation is the boolean expression. Once again, you
> can reasonably ask the programmer to disambiguate the code by putting
> parenthesis if he wants Salsa to interpret it as the latter:
>        bool bool_ = a<  (b || c)>  (d&&  e);
> (Actually, I wonder why the standard allows such ambiguities.)

That's not enough, actually: "a" may still be a class template or a
function template. How will you handle that ambiguity?

More importantly, do you believe that you can handle *every* ambiguity
in the C++ language in this way, by asking the user to insert
parentheses that no other tool requires?

> I'd like to know: is there a reasonable chance that such a library
> will be accepted into Boost?

That's decided by the Boost community, but if I were to review a
library that professes to parse C++ while actually parsing an
arbitrarily-disambiguated subset of the C++ language, or that cannot
parse Boost itself, I would vote against acceptance.

  - Doug


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk