From: Joel de Guzman (djowel_at_[hidden])
Date: 2002-10-14 20:45:39
----- Original Message -----
From: "Beman Dawes" <bdawes_at_[hidden]>
> While space/speed comparisons are always interesting, there are other
> characteristics which often drive the decision as to which parser is best
> for an application. For example,
> * Usefulness and malleability of error messages, particularly when
> dying on the first error encountered isn't an option.
LL parsers (including RD) are superior to LR in error reporting.
Top-down parsers have much better context information.
Spirit has an error-handling facility, designed around C++ exceptions,
to handle error-reporting as well as attempt error-correction.
In moderate cases, the grammar itself and the attached semantic actions
are sufficiently rich to handle errors. If I may, I'll add another ~criteria~
in your list:
* ease of integration with semantic action code.
This is an area where Spirit truly shines. Semantic actions may be placed
anywhere in the grammar. And, continuing, can be used for error
reporting (and handling) as well. I've found this Spirit-EBNF construct to
be particularly simple and useful:
r = a | b | c | anychar_p[handle_unexpected];
Alternatives such as (a | b | c) are typically inside the kleene star in
grammars. anychar_p handles the unexpected case. Usually, you
use the epsilon for this. Yet, in simple cases, the anychar_p eats
a single char(or token- Spirit is not limited to chars or wchar_t), emits
the error message, and then the outer kleene star attempts to
I used this simple technique in the cpp_to_html (C++ to html) converter
demo example (included in the distribution).
> * Ability to make small changes to the parsed language without
> disastrous rewrites of existing parser code and/or tables.
Oh perfect :-)! Spirit in fact promotes stepwise refinement. As David Held
mentions in his post, you can build grammars incrementally. The idea is
that, quoted from the user's manual:
"As the grammar gets quite complicated, it is a good idea to group parts of
the grammar into logical modules. For instance, when writing a language, it
might be wise to put expressions and statements into separate grammar
capsules. The grammar takes advantage of the encapsulation properties of
C++ classes. The declarative nature of classes makes it a perfect fit for
the definition of grammars. Since the grammar is nothing more than a class
declaration we can conveniently publish it in header files. The idea is
that once written and fully tested, a grammar can be reused in many
contexts. We now have the notion of grammar libraries."
> * Can easily parse local contradictions (always AB,
> except in certain expressions its BA).
If I understand correctly, you mean special cases?
Correct me if I'm wrong.
The framework is malleable to special cases. There are exciting
possibilities with dynamic parsing. Here's an example:
parse p1 if c evaluates to true, else parse p2.
The declarative nature of (E)BNF and the imperative nature of C++,
when combined results to an interesting and powerful blend that
can do things that can not be done before. Consider the parsing of
length prefixed pascal strings for example. (BTW, Spirit can be
used to parse binary files) :
r = anychar_p[assign(c)] >> repeat_p(boost::ref(c))[anychar_p];
parses a single char (or token)
loop construct: repeat "p", "n" times
(similar to if_p(c)[p1] .else[p2] in syntax)
This AFAIK cannot be done in YACC or ANTLR. Yet, we've just seen the tip
of the iceberg :-)
> My past experience has been that recursive decent parsers are very strong
> in those kinds of "quality" areas. I'd like to get some indication of how
> Spirit does with such quality related issues.
Spirit currently is RD. As such, it inherits the nice properties that you
described and extends it further thanks to generative abilities of C++
expression templates and generic programming. Yet, we do not wish to
stop there. For instance, I've been researching on new parsing techniques
that can be applied to Spirit. One such example is the so called "recursive
ascent". RA is the equivalent of "functional" (using mutually recursive functions)
equivalent of LR.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk