Boost logo

Boost Users :

Subject: Re: [Boost-users] Interested in parsing tools
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2009-09-12 21:37:07


On Sat, Sep 12, 2009 at 7:04 PM, Ramon F Herrera <ramon_at_[hidden]> wrote:
> Diederick C. Niehorster wrote:
>>
>> This makes me wonder how Xpressive and Spirit compare, both do
>> compiled parsing statements right?
>>
>> Can Spirit somehow be seen as Xpressive + more? Why use Xpressive at all
>> then?
>>
>> Best,
>> Dee
>>
>
> Hi Dee,
>
> While I am far from being an expert (and hope to read answers from people
> more qualified than myself), I was wondering the same thing.
>
> I venture a guess. You draw the line here:
>
> Several tools (Regex, Xpressive, Perl) can grab data, based on regular
> expressions. They can validate whether some statement is a correct
> expression of the target "language".
>
> The more advanced tools, however, can actually put that data into action.
> The passive data becomes a set of executable statements using Spirit, ANTLR,
> etc. You specify things like: "every time you find the verb "such-and-such"
> call my function XYZ with the needed parameters.
>
> Callback functions are the dividing line.

Static Xpressive (not dynamic Xpressive, you can think of dynamic
Xpressive as being Boost.Regex exactly) and Spirit are both C++ DSEL's
and compile to rather fast code, but Xpressive is a REGEX parser, and
as such has limitations, where Spirit2.1 is a PEG (Parsing Expression
Grammer as I recall, wiki it) grammar. PEG's are nice in that they
are unambiguous, they are faster, then have unlimited lookahead,
etc... Spirit can also be bound to just about anything in any way in
the C++ world, with built-in parsers for a ton of things (everything
from POD's to the STL to many Boost libraries like Fusion and such),
and it is quite easy to make your own new things as well. For a
comparison of Spirit2.1 with ANTLR, Spirit2.1 has been shown to be
faster in execution speed, the code is a great deal shorter, and
ANTLR's actions pale in comparison to Spirit2.1's versions, plus the
fact you do not need to pre-parse code with an external app like you
have to do with ANTLR.

But yea, to learn what Spirit2.1 is built off of, look up PEG's on
wikipedia, and yet Spirit2.1 is still so much more powerful then that.
 The documentation for it is in trunk. But as an example, I wrote
this grammer a free hours ago, it is a relatively nasty looking one
that I really should break up into easier to read parts, but it works
quite well in my testing thus far:

                static boost::spirit::qi::rule<std::string::const_iterator,
std::string::value_type()> uri_decode_rule;
                        uint_parser<std::string::value_type, 16, 2, 2> _string_value_hex;
                        uri_decode_rule
                                = (lit('%') >> _string_value_hex[_val=boost::spirit::_1])
                                | (lit('+') >> eps[_val=' '])
                                | char_[_val=boost::spirit::_1]
                        ;

                                        -no_case[+(uri_decode_rule-char_(':')) >> lit("://")] //scheme
>> -no_case[+(uri_decode_rule-lit('/'))] // netloc
>> ( lit('/') >> (
                                                        (+(uri_decode_rule-char_("/?;#")) >>
                                                        -(lit(';') >> (+(uri_decode_rule-char_(";=?/#")) >> -(lit('=')
>> (+(uri_decode_rule-char_(";,?/#")))%lit(',')))%lit(';')) // params
(ex: ;param=val1,val2,val3;p2)
                                                ) % lit('/')) >> // path
                                                -(lit('?') >> -((+(uri_decode_rule-char_("&=;#")) >> -(lit('=')
>> +(uri_decode_rule-char_("&;#")))) % omit[char_("&;")] >>
omit[-char_("&;")])) >> // query
                                                -(lit('#') >> *uri_decode_rule) // fragment
                                        )

It parses a URI (a type that only my system will generate, so I am not
sure it follows the spec exactly, but it works very well for my
purpose, and it is very fast).
Only thing it does is parse all the info into this:

                struct uri
                {
                        std::string scheme;
                        std::string netloc;
                        typedef std::vector< std::pair< std::string,
std::vector<std::string> > > params_t;
                        typedef std::vector< std::pair<std::string,params_t> > path_t;
                        path_t path;
                        typedef std::vector< std::pair<std::string,std::string> > query_t;
                        query_t query;
                        std::string fragment;
                };


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net