Boost logo

Boost :

Subject: Re: [boost] Determining interest: C++11 parser generator library
From: Gene Bushuyev (408727_at_[hidden])
Date: 2011-11-14 22:23:28


"Joel de Guzman" <joel_at_[hidden]> wrote in message
news:j9sb3i$c7g$1_at_dough.gmane.org...
> On 11/15/2011 1:08 AM, Gene Bushuyev wrote:
>> "John Bytheway" <jbytheway+boost_at_[hidden]> wrote in message
>> news:j9qp21$f09$1_at_dough.gmane.org...
>>> On 13/11/11 23:39, Gene Bushuyev wrote:
>>>> Sorry if it turns out to be a duplicate, it looks my original post was
>>>> lost in cyberspace, so I'm re-posting this request.
>>>>
>>>> I'm trying to determine if there is a sufficient interest for including
>>>> AXE C++11 recursive descent parser generator library in Boost. The
>>>> zipped sources and documentation are here:
>>>> http://www.gbresearch.com/axe/axe.zip
>>>
>>> People are more likely to investigate if you can provide a link to the
>>> documentation online somewhere, so they don't have to download and
>>> extract a zip file.
>>>
>>> It would also be useful to explain briefly how it compares with
>>> Boost.Spirit.
>>>
>>> John Bytheway
>>
>> It's true there is a significant overlap with Spirit. It's also true
>> there is more than
>> one way to do the parsing, so some people will be more comfortable with
>> Spirit, and I have
>> reasons to believe some people will be more comfortable with AXE. There
>> are differences,
>> importance of which depends on personal perspective and needs. I tried to
>> summarize below
>> what I would consider advantages of AXE:
>>
>> * it's a much smaller header only library: 15 files, 126 KB total
>> * it has no dependencies on other libraries apart from the Standard
>> library
>> * it uses only standard facilities, so theoretically it should work with
>> any C++11
>> compiler without any modifications
>> * compilation times are much shorter than Spirit
>> * the syntax is less cryptic than Spirit, so it's easier to remember,
>> write, debug, and
>> read parsers written in AXE (this is, of course, subjective)
>> * in my limited comparison, parsers written in AXE take fewer lines of
>> code to write, and
>> development times are shorter
>>
>> Disadvantages:
>> * AXE requires C++11 compiler, current status of compiler support is
>> unknown
>> * It's been released recently, thus there is limited experience working
>> with it
>
> When Spirit debuted, it was a 7 header file. If your library gets more
> mature,
> the added complexity will be necessary. Your main advantage is simplicity.
> I can't argue with that. However, it is also your big disadvantage. Here
> are
> some more important points you missed in your Disadvantages section:
>
> * It does not have unicode support.

It does have a wide character support, or better say, you can instantiate
rules on any character type. You can mix narrow and wide characters, you
also mix binary and text parsing. Many parsers would work with unicode files
without any modification.

> * It does not have attributes and AST support. It is a purely
> transduction parser like Spirit 1.0. So in every step,
> you have to convert an iterator range to an attribute manually.

I was thinking about adding AST, but so far there wasn't any need that would
justify the additional complexity. My previous experience with various
parsers creating ASTs and then traversing them was rather negative both in
terms of performance and complexity. Maybe it will change in the future.

> * It does not have support for polymorphic semantic actions
> (you know that c++ lambda is monomorphic, right?).

There is a polymorfic class r_rule, which uses std::function. It's primarily
used for expressing recursion. It can be used on it's own, of course, but
unlike auto rules it would introduce often unnecessary performance hit. But
if one wants to return a parser from a function or keep a rule as class
member then polymorphic rule will do just fine.

> * It does not have reusable grammars

There aren't reusable grammars, but nothing prevents from creating reusable
parsers. I have a few.

> * No symbol tables
> * No character sets

So far there wasn't a need for that.

> * No separation of grammar construction and parsing. Your examples have
> a big overhead: they build the parser every time you parse.

I don't think there is a big overhead. Not in real world applications. This
design in its pre-C++11 incarnation was used for the last 7 years in several
binary and text parsers. Based on that I know its raw performance was not a
factor. In existing parsers I've seen disk access and filling the data
structures was a major factor. But, of course, as I mentioned the experience
is limited.

> * The syntax is *more* cryptic than Spirit (this is, of course, subjective
> :-)
>
> Just to name a few.
>
> Regards,
> --
> Joel de Guzman
> http://www.boostpro.com
> http://boost-spirit.com
>

Thanks for taking a bite.

Gene Bushuyev


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk