Boost logo

Boost :

From: Douglas Gregor (gregod_at_[hidden])
Date: 2002-10-18 14:46:52


My review of the Spirit Parser Framework follows. I have read through all of
the documentation provided for Spirit and have ported a portion of a YACC
parser for a specification language I use to Spirit. I did not have the
chance to peruse the Spirit source code, so my comments will be limited to
"user" experiences. That said,

Spirit should be accepted into Boost.

Refactoring
--------------
  * There are several libraries within Spirit that can (should) be refactored
into separate Boost libraries. This includes:
    + The multi-pass iterator
    + The file iterator
    + The position iterator
    + Ternary search tree
    
    All of these components could be useful outside of Spirit, and IMHO should
become full-fledged Boost libraries. In fact, I'd actually like to see these
components reviewed separately for inclusion in Boost.

  * It's hard to tell where Spirit ends and Phoenix begins. The Spirit headers
#include headers from Phoenix, and it seems that any nontrivial use of
closures requires Phoenix (though I do understand that Lambda support is
upcoming). I'd rather not pull all of Phoenix into Boost along with Spirit,
because Phoenix deserves its own dedicated review (as do the aforementioned
components in Spirit). Not to mention that the inclusion of the parser I
ported to Spirit would result in a project containing Boost.Bind,
Boost.Lambda, and Phoenix code :)

Code Nitpick
------------------
  * The match class contains "operator bool()", whereas I would have expected
the use of the safe_bool idiom. Any particular reason for this?

User Experiences
-----------------------
  * I found it very easy to construct parsers with Spirit. I'd already become
accustomed to the syntax of Spirit from lurking on the list, but it was very
easy to put that knowledge to code and get a working parser quickly.

  * Debugging parsers is quite easy with SPIRIT_DEBUG. Great feature.

  * Semantic actions weren't quite as easy to get working. I needed to build
an abstract syntax tree. ast_parse wouldn't do the job for me, because I'd
like the tree to come out in terms of my language's internal representation
(without additional conversion steps), so I settled on using closures....
Closures feel like the wrong solution for most simple parsing tasks, because
they require extra assignments to get the values we need into the right
places; the YACC-like solution of using $1, $2, etc. feels like a better
match because it makes the result of child nodes immediately available. I
would like to do something like this:

  field_decl = (identifier >> ':' >> type)[bind(&create_field)(arg1, arg3)]

Where "arg1" will be the value of the identifier subtree and "arg3" will be
the value of the type subtree. The result of the (deferred) call to
create_field would then be the value of the field_decl subtree.

Currently, I need to use closures to temporarily store the results of the
identifier and type subtrees so that they can be used in the action for
field_decl. For instance, the closure might be:

struct field_decl_closure :
  spirit::closure<field_decl_closure, field_ptr, id_ptr, type_ptr>
{
  member1 val;
  member2 id;
  member3 type;
};

Then field_decl_closure::context_t becomes the parser context for the
field_decl (sub)rule, and the rule assignment becomes:

  field_decl =
    (identifier[field_decl.id = arg1] >> ':'
>> type[field_decl.type = arg1])
        [field_decl.val = bind(&create_field)(field_decl.id, field_decl.type)]

I understand that Hartmut is working on a group parser that achieves something
like the former syntax by adding a new group_d directive. Perhaps group_d
should be the default behavior?

  * The parser/skipper interaction confused me quite a bit. A rule such as
"lexeme_d[*alpha_p]", when used with a skipper such as space_p to skip
whitespace, will pass an iterator sequence [first, last) that may contain
characters at the beginning or end that are skipped by the skipped. That is,
the semantic action could receive a string "foo " (note the spaces at the
end), even though lexeme_d[*alpha_p] doesn't actually match any spaces. While
I now understand the reason this occurs, and how to fix it (closures), but I
don't understand why this behavior is desirable. It's confusing, seems to
require a bit of infrastructure to overcome (Phoenix w/ closures), and is
quite different from the traditional view of a lexeme.

  * Grammars are great. There was a time I was quite worried about
rule<concrete_scanner_type>, but grammars calmed that fear and give better
overall organization to Spirit parsers.

  * I found it very easy to extend Spirit by adding a new parser (a special
symbol table lookup parser). This is probably Spirit's strongest attribute:
it's very extensible by anyone.

Great library. I know how I'll be writing my future parsers...

        Doug


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk