Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2007-03-01 13:58:34


Boris Kolpackov wrote:
> Hi Robert,
>
> "Robert Ramey" <ramey_at_[hidden]> writes:
>
>> A while ago I made a suggestion about using the spirit parser with
>> its associated xml grammers.
>>
>> No one has commented on this. I'm curious why this idea doesn't
>> seem to be attractive to anyone else.. I used it with very good
>> results in the serialization library. It created a much more robust
>> and maintainable parser than I could have done by hand. What am I
>> missing here?
>
> The question is whether it is a conforming XML parser? That means
> support for:

Actually I don't think that's the question at all.

The question is about the strategy of development. To parse a grammar
one can use a grammar driven parser (yacc, bison, spirit, etc) or one
can write code to explicitly parse the grammar.

> - namespaces
> - character references
> - entity references
> - CDATA
> - DTD well-formedness checking, entity declaration processing and
> replacement, substitution of default attribute values, etc.

> My uneducated guess is that "spirit-based XML grammar" is not a
> conforming XML parser.

Not relevant.

The question isn't which features the particular XML parser included
with spirite supports. Any missing features could be added to the
grammar without too much problem - that's the appeal of using
a grammar driven approach.

> The next question is how much effort it will take to fix it up

Much less than hand coding yet another xml parser.

> and whether it will still be as robust, maintainable,

A parser generated from a formal grammar is going to be
much more robust, and maintainable. The grammar can
be vrified independently of the implementaion.

> and efficient (I doubt it very much).

This might be a legitimate concern. Some tests suggest that a
hand coded parser can be made more efficient than a machine
generated one. But of course it would really depend on the
quality of the hand coding itself which is hard to speculate on.
In anycase this would strike me as pre-mature optimization.
If it were my problem, I would start with the most expedient way to
make a robust and maintainable parser. If I found it to be
"too slow" that module could well be replaced with a hand
coded equivalent.

> The reason why you had good results with serialization library
> is because you control both production and consumption of the
> instances so you can easily restrict yourself to a subset of XML.

The reason I had good results with spirit with serialization library
is that it's good, robust, well designed and well documented code.
I built on that.

> Once you need to process *any* valid XML things get a lot more
> complicated.

Which is even more reason to avoid a hand coded parser.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk