Boost logo

Boost :

From: Reece Dunn (msclrhd_at_[hidden])
Date: 2003-06-02 06:16:52

Anthony Williams wrote:

> > Writing an XML parser from scratch for boost should, IMHO, have these
> > features:

> > [1] It should make use of the Spirit and Regex libraries for XML and
> > parsing.

>Whilst these libraries might be useful for the parser writer, I don't see
>benefit to requiring their use for a boost XML parser. If a submitted
>used alternative parsing methods that should be acceptable provided it

Writing a lexer/parser is a complex task. It wasn't a requirement, more a
suggestion/my opinion on what a boost XML library should be like. There are
four possible options:

[1] Write the lexer-parser by hand. This is a difficult process to get
right, and can lead to complex code that is difficult to update.

[2] Use flex/bison or equivalents. These are C-based lexers and as such, it
is difficult to integrate with C++ (especially since several implementations
use K&R C and make use of variables called class!!). Also, do we then
require that flex/bison be distributed with Boost??

[3] Use Boost.Spirit/Boost.Regex. These are written in C++ and so make use
of advanced techniques. For example, the use of templates make writing the
parser as easy as writing BNF grammars! Also, Spirit uses
trinary-search-trees that have very fast lookup as an associative-style
container. Also, using these libraries will prevent wheel reinvention and
make the code more boostified.

[4] Use another lexer/parser generator. This is an unknown, and again with
the Boost distribution.

> > [2] It should conform to the following W3C standards:

> > (b) DOM 1.0/2.0/3.0

>Hmm. The DOM standards in particular are very Java oriented, and don't
>necessarily make for efficient C++ bindings. I can see that the parser
>to provide the same set of facilities though, even if it is done in a
>different way.

If you are writing a program that interacts with XML via a scripting
language, then DOM bindings would be needed (especially if you are wanting a
browser that can, for example, control SVG objects when the user interacts
with them). I know that this could be done using the MS parser, but what if
you wanted good unicode support for something like MathML?

I also agree that efficient C++ bindings would be very desirable. What about
a C++-to-DOM binding wrapper?
   e.g. boost::w3c::dom::DOMElement< boost::xml::element >
NOTE: This is just a suggestion. That way, you can make the C++ versions
very efficient, while the DOM versions will have a wrapper layer to them.

>IMHO, the base parser should provide an API on which other things can be
>built. For example, provided the facilities are present to retrieve the
>information needed for XPath processing, the core API doesn't need to have
>XPath processor. Likewise for XSLT.

Agreed. XML, DTD and XPath parsing and structure navigation with unicode
support are all that is required for the base level. That is why I put the
others as optional. It would be nice if the library supported XPath
navigation, XSLT, and DTD/XMLSchema validation, though as these are common

>However, I think it is important that the library does include add-on APIs
>as much of the supporting standards as possible, such as DOM-like
>XPath node selection, DTD and XMLSchema validation, and XSLT.

Agreed. Perhapse it would be best to organise accoring to facilities:

   boost::xml::dom -- C++ DOM bindings
   boost::w3c::dom -- W3C DOM bindings (requires boost::xml::dom)
   boost::xml::xpath -- XPath parsing/navigation
   boost::xml::xslt -- requires boost::xml::dom and boost::xml::xpath
   boost::xml::xslfo -- requires boost::xml::xslt
   boost::xml::mathml -- requires boost::xml::dom

This way, the user can include which API's he/she wants with minimal

> > [4] It should provide XPath bindings to the XML DOM in a clean way; I
> > personally like the MS selectSingleNode/selectNodes extension to the XML
> > node DOM interface.

>There is no point in providing XPath support if it's painful to use.

I was thinking in terms of a W3C DOM. If we are thinking in terms of C++
bindings, the usage could be like this:
   boost::xml::dom::node root;
   // ...

   // select a single node - note usage of array-style notation:
   boost::xml::dom::node sel = root[ boost::xml::xpath::expr( L"/*[1]" )];

   // select a collection of nodes - can accept an XPath string or XPath
   boost::xml::xpath::result_set math( root, L"//m:math" );

This would give a cleaner interface between XML and XPath. (NOTE: I have
implemented this style of syntax for my MS-XML wrappers).

> > [5] It should have a clean access to attributes, without the user
>needing to
> > call get/set methods.

>I am not sure what you mean here.

I was thinking from a W3C DOM/MS COM PoV where the attributes are
implemented via get and set methods [Example:
   get_documentElement( IXMLDOMNode * )
   XMLDOMNode XMLDOMDocument.documentElement
], but with clean C++ bindings this is largely irrelevant.

>I am developing Axemill ( to fulfil most
>these goals, with the eventual goal of submitting it to boost. If you want
>contribute code and/or ideas, please email me. Currently, it requires gcc
>(though it should build with other relatively conforming compilers) and
>1.29.0 (I intend to move to 1.30.0 shortly)

I would be happy to help out with code and ideas.


It's fast, it's easy and it's free. Get MSN Messenger today!

Boost list run by bdawes at, gregod at, cpdaniel at, john at