Boost logo

Boost :

Subject: Re: [boost] Determining interest: C++11 parser generator library
From: Joel de Guzman (joel_at_[hidden])
Date: 2011-11-15 00:52:49


On 11/15/2011 11:23 AM, Gene Bushuyev wrote:
>> When Spirit debuted, it was a 7 header file. If your library gets more mature,
>> the added complexity will be necessary. Your main advantage is simplicity.
>> I can't argue with that. However, it is also your big disadvantage. Here are
>> some more important points you missed in your Disadvantages section:
>>
>> * It does not have unicode support.
>
> It does have a wide character support, or better say, you can instantiate rules on any
> character type. You can mix narrow and wide characters, you also mix binary and text
> parsing. Many parsers would work with unicode files without any modification.

Unicode support is a lot more than that. See
http://unicode.org/reports/tr18/tr18-5.1.html. You do not
even have level-1 support.

>> * It does not have attributes and AST support. It is a purely
>> transduction parser like Spirit 1.0. So in every step,
>> you have to convert an iterator range to an attribute manually.
>
> I was thinking about adding AST, but so far there wasn't any need that would justify the
> additional complexity. My previous experience with various parsers creating ASTs and then
> traversing them was rather negative both in terms of performance and complexity. Maybe it
> will change in the future.

Seems you haven't done much parsing ;-) When you get into *real*
attribute grammars, then your simplicity will no longer be an
advantage (http://www.haskell.org/haskellwiki/Attribute_grammar).

>> * It does not have support for polymorphic semantic actions
>> (you know that c++ lambda is monomorphic, right?).
>
> There is a polymorfic class r_rule, which uses std::function. It's primarily used for
> expressing recursion. It can be used on it's own, of course, but unlike auto rules it
> would introduce often unnecessary performance hit. But if one wants to return a parser
> from a function or keep a rule as class member then polymorphic rule will do just fine.

Nope. That's not what I meant. Anyway, rules cannot ever be polymorphic
because of type erasure, regardless if it's c++11.

>> * It does not have reusable grammars
>
> There aren't reusable grammars, but nothing prevents from creating reusable parsers. I
> have a few.
>
>> * No symbol tables
>> * No character sets
>
> So far there wasn't a need for that.

Which makes it very limited in my view.

>> * No separation of grammar construction and parsing. Your examples have
>> a big overhead: they build the parser every time you parse.
>
> I don't think there is a big overhead. Not in real world applications. This design in its
> pre-C++11 incarnation was used for the last 7 years in several binary and text parsers.
> Based on that I know its raw performance was not a factor. In existing parsers I've seen
> disk access and filling the data structures was a major factor. But, of course, as I
> mentioned the experience is limited.

It's OK for small micro parsers. Wait till you go beyond "small".
Even your JSON parser will never be optimal because of this. I'm
not sure what you mean by real world applications.

Regards,

-- 
Joel de Guzman
http://www.boostpro.com
http://boost-spirit.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk