Boost logo

Boost :

From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2006-04-29 07:49:09


Daniel Walker wrote:

>On 4/24/06, Marcin Kalicinski <kalita_at_[hidden]> wrote:
>
>
>>My knowledge of XML is limited, but I think Dan Nuffer's parser will
>>parse any valid XML. read_xml however discards all that goes beyond nodes,
>>attributes, data and comments.
>>
>>
>
>Isn't the property_tree XML parser originally based on Dan Nuffer's?
>Couldn't the productions/tokens from the Nuffer parser be added back
>to read_xml() so that it could at least accept the syntax for all XML
>files even if it doesn't implement the semantics? I think the runtime
>overhead of the additional productions in the grammar would be
>negligible for simple XML files that don't use the features and
>necessary for XML files that do. It seems to me this could clarify the
>scope of the parser. The documentation could read something like:
>
>"read_xml() preforms non-validated parsing of the W3C recommendation
>XML 1.1. In addition, as of version 1.3x, read_xml() parses but
>ignores the following W3C specifications: XML Names, XInclude,
>XLink/XPointer, XML Schema, XSLT, ..."
>
>... changing version numbers as appropriate. Also, it may simplify
>maintenance as far as pulling bug-fixes/enhancements from the Nuffer
>parser code-base to property_tree.
>
>
The property tree's parser is, I believe, either a very slightly modifed
Dan Nuffer parser (just semantic actions were added, compared to the
file I've seen), or built on the same principle: direct translation of
the grammar spec in the XML specification. It is, with the exception of
missing entities, a complete non-validating parser of the XML spec, as
far as I can see, with the important exception of character set
compatibility: the parser parses only files in the character set
specified by the current global locale, and will completely ignore the
character set specification of the header. Another missing part may be
the parsing of the internal DTD subset, which might be (not sure yet) a
required thing for non-validating parsers.
In addition, it is an XML 1.0 parser.
The Namespaces in XML, XInclude, XLink, XPointer, ... specifications are
all built on top of XML; they are all well-formed XML. "Parsing but
ignoring" them means nothing and can only lead to misunderstandings.

Sebastian Redl


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk