Boost logo

Boost :

From: Joel de Guzman (djowel_at_[hidden])
Date: 2002-10-18 20:45:41


----- Original Message -----
From: "Douglas Gregor" <gregod_at_[hidden]>

> My review of the Spirit Parser Framework follows. I have read through all of
> the documentation provided for Spirit and have ported a portion of a YACC
> parser for a specification language I use to Spirit. I did not have the
> chance to peruse the Spirit source code, so my comments will be limited to
> "user" experiences. That said,
>
> Spirit should be accepted into Boost.

Thanks!

> Refactoring
> --------------
> * There are several libraries within Spirit that can (should) be refactored
> into separate Boost libraries. This includes:
> + The multi-pass iterator
> + The file iterator
> + The position iterator
> + Ternary search tree

Might I add the wchar_t savvy character set implemented using range-runs? :-)
Nathan Myers started a thread regarding this a few months ago. I'm not sure
what happened.

> All of these components could be useful outside of Spirit, and IMHO should
> become full-fledged Boost libraries. In fact, I'd actually like to see these
> components reviewed separately for inclusion in Boost.

I'd definitely want to use a common library for these parts to get rid of the
fear that Spirit is big.

> * It's hard to tell where Spirit ends and Phoenix begins. The Spirit headers
> #include headers from Phoenix, and it seems that any nontrivial use of
> closures requires Phoenix (though I do understand that Lambda support is
> upcoming). I'd rather not pull all of Phoenix into Boost along with Spirit,
> because Phoenix deserves its own dedicated review (as do the aforementioned
> components in Spirit). Not to mention that the inclusion of the parser I
> ported to Spirit would result in a project containing Boost.Bind,
> Boost.Lambda, and Phoenix code :)

Rest assured, this will be addressed. Full Lambda support is upcoming.
In fact, if you do a grep with the current Lambda code, you'll see
Phoenixisms in many places already :-)

>
> Code Nitpick
> ------------------
> * The match class contains "operator bool()", whereas I would have expected
> the use of the safe_bool idiom. Any particular reason for this?

No particular reason. The match class will benefit from using the safe_bool idiom.
Is there a safe_bool utility class somewhere in boost that I can use?

> User Experiences
> -----------------------
> * I found it very easy to construct parsers with Spirit. I'd already become
> accustomed to the syntax of Spirit from lurking on the list, but it was very
> easy to put that knowledge to code and get a working parser quickly.
>
> * Debugging parsers is quite easy with SPIRIT_DEBUG. Great feature.
>
> * Semantic actions weren't quite as easy to get working. I needed to build
> an abstract syntax tree. ast_parse wouldn't do the job for me, because I'd
> like the tree to come out in terms of my language's internal representation
> (without additional conversion steps), so I settled on using closures....
> Closures feel like the wrong solution for most simple parsing tasks, because
> they require extra assignments to get the values we need into the right
> places; the YACC-like solution of using $1, $2, etc. feels like a better
> match because it makes the result of child nodes immediately available. I
> would like to do something like this:
>
> field_decl = (identifier >> ':' >> type)[bind(&create_field)(arg1, arg3)]
>
> Where "arg1" will be the value of the identifier subtree and "arg3" will be
> the value of the type subtree. The result of the (deferred) call to
> create_field would then be the value of the field_decl subtree.
>
> Currently, I need to use closures to temporarily store the results of the
> identifier and type subtrees so that they can be used in the action for
> field_decl. For instance, the closure might be:
>
> struct field_decl_closure :
> spirit::closure<field_decl_closure, field_ptr, id_ptr, type_ptr>
> {
> member1 val;
> member2 id;
> member3 type;
> };
>
> Then field_decl_closure::context_t becomes the parser context for the
> field_decl (sub)rule, and the rule assignment becomes:
>
> field_decl =
> (identifier[field_decl.id = arg1] >> ':'
> >> type[field_decl.type = arg1])
> [field_decl.val = bind(&create_field)(field_decl.id, field_decl.type)]
>
> I understand that Hartmut is working on a group parser that achieves something
> like the former syntax by adding a new group_d directive. Perhaps group_d
> should be the default behavior?

This can be achieved through a combination of closures +
the grouped scanner policies, specialized parse function.
Hartmut's prototype is a good starting point. For example:

    group_parse(first, last, p);

can be added to the repertory. The scanner mechanism in place
is already sufficiently powerful for Hartmut's grouping parser idea
to be implemented non-intrusively without using a directive.
The idea is that the scanner will collect the attributes and place it
in a closure. The semantic action will then be able to get
all the attributes in parallel as arg1...argN.

As I see it now, Hartmut's grouping parsers has great potential.
I feel more development coming towards this. We are currently
discussing adding a "meta" layer that will conceptualize and
compartmentalize the basic infrastructure needed to support
meta parser-transformations.

The closure as we have it now is really an attempt to hit two birds.
Historically, it was conceived to solve the backtracking and semantic
actions problem. At the same time, there was also a real need to
get synthetic and inherited attributes working. At the time, there
was still vague notions of meta-spirit.

With Spirit-X (a great departure from v1.3 lineage), Hartmut, Dan
and I started to experiment on more of meta-spirit where
static-C++ parse trees are transformed in place. The latest of
which is Hartmut's grouping parsers.

The original behavior of Spirit (from 1.0) passes the bare iterator
pairs to the semantic actions. Slowly, Spirit evolved to have more
specialized actions where direct attribute values are passed instead;
an example of which is the real_p parser. I think what we are seeing
more of this where each node can have a statically typed attribute.
This concept enables elaborate attribute processing.

I am convinced that closures + meta-spirit + Hartmuts advances
in refactoring and transforming parsers will be mighty powerful
combination. We've seen nothin' yet :-)...

> * The parser/skipper interaction confused me quite a bit. A rule such as
> "lexeme_d[*alpha_p]", when used with a skipper such as space_p to skip
> whitespace, will pass an iterator sequence [first, last) that may contain
> characters at the beginning or end that are skipped by the skipped. That is,
> the semantic action could receive a string "foo " (note the spaces at the
> end), even though lexeme_d[*alpha_p] doesn't actually match any spaces. While
> I now understand the reason this occurs, and how to fix it (closures), but I
> don't understand why this behavior is desirable. It's confusing, seems to
> require a bit of infrastructure to overcome (Phoenix w/ closures), and is
> quite different from the traditional view of a lexeme.

Indeed. I'll see how this can be solved. Not that it's not doable.
The issue is how to do it while still maintaining efficiency and
genericity.

> * Grammars are great. There was a time I was quite worried about
> rule<concrete_scanner_type>, but grammars calmed that fear and give better
> overall organization to Spirit parsers.
>
> * I found it very easy to extend Spirit by adding a new parser (a special
> symbol table lookup parser). This is probably Spirit's strongest attribute:
> it's very extensible by anyone.
>
> Great library. I know how I'll be writing my future parsers...

Thank you very much.

Regards,
--Joel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk