Boost logo

Boost :

From: Reece Dunn (msclrhd_at_[hidden])
Date: 2003-06-02 06:16:52


Anthony Williams wrote:

> > Writing an XML parser from scratch for boost should, IMHO, have these
> > features:

> > [1] It should make use of the Spirit and Regex libraries for XML and
>XPath
> > parsing.

>Whilst these libraries might be useful for the parser writer, I don't see
>any
>benefit to requiring their use for a boost XML parser. If a submitted
>parser
>used alternative parsing methods that should be acceptable provided it
>worked.

Writing a lexer/parser is a complex task. It wasn't a requirement, more a
suggestion/my opinion on what a boost XML library should be like. There are
four possible options:

[1] Write the lexer-parser by hand. This is a difficult process to get
right, and can lead to complex code that is difficult to update.

[2] Use flex/bison or equivalents. These are C-based lexers and as such, it
is difficult to integrate with C++ (especially since several implementations
use K&R C and make use of variables called class!!). Also, do we then
require that flex/bison be distributed with Boost??

[3] Use Boost.Spirit/Boost.Regex. These are written in C++ and so make use
of advanced techniques. For example, the use of templates make writing the
parser as easy as writing BNF grammars! Also, Spirit uses
trinary-search-trees that have very fast lookup as an associative-style
container. Also, using these libraries will prevent wheel reinvention and
make the code more boostified.

[4] Use another lexer/parser generator. This is an unknown, and again with
the Boost distribution.

> > [2] It should conform to the following W3C standards:

> > (b) DOM 1.0/2.0/3.0

>Hmm. The DOM standards in particular are very Java oriented, and don't
>necessarily make for efficient C++ bindings. I can see that the parser
>needs
>to provide the same set of facilities though, even if it is done in a
>different way.

If you are writing a program that interacts with XML via a scripting
language, then DOM bindings would be needed (especially if you are wanting a
browser that can, for example, control SVG objects when the user interacts
with them). I know that this could be done using the MS parser, but what if
you wanted good unicode support for something like MathML?

I also agree that efficient C++ bindings would be very desirable. What about
a C++-to-DOM binding wrapper?
   e.g. boost::w3c::dom::DOMElement< boost::xml::element >
NOTE: This is just a suggestion. That way, you can make the C++ versions
very efficient, while the DOM versions will have a wrapper layer to them.

>IMHO, the base parser should provide an API on which other things can be
>built. For example, provided the facilities are present to retrieve the
>information needed for XPath processing, the core API doesn't need to have
>an
>XPath processor. Likewise for XSLT.

Agreed. XML, DTD and XPath parsing and structure navigation with unicode
support are all that is required for the base level. That is why I put the
others as optional. It would be nice if the library supported XPath
navigation, XSLT, and DTD/XMLSchema validation, though as these are common
facilities.

>However, I think it is important that the library does include add-on APIs
>for
>as much of the supporting standards as possible, such as DOM-like
>processing,
>XPath node selection, DTD and XMLSchema validation, and XSLT.

Agreed. Perhapse it would be best to organise accoring to facilities:

   boost::xml::dom -- C++ DOM bindings
   boost::w3c::dom -- W3C DOM bindings (requires boost::xml::dom)
   boost::xml::xpath -- XPath parsing/navigation
   boost::xml::xslt -- requires boost::xml::dom and boost::xml::xpath
   boost::xml::xslfo -- requires boost::xml::xslt
   boost::xml::mathml -- requires boost::xml::dom
   etc.

This way, the user can include which API's he/she wants with minimal
dependencies.

> > [4] It should provide XPath bindings to the XML DOM in a clean way; I
> > personally like the MS selectSingleNode/selectNodes extension to the XML
> > node DOM interface.

>There is no point in providing XPath support if it's painful to use.

I was thinking in terms of a W3C DOM. If we are thinking in terms of C++
bindings, the usage could be like this:
   boost::xml::dom::node root;
   // ...

   // select a single node - note usage of array-style notation:
   boost::xml::dom::node sel = root[ boost::xml::xpath::expr( L"/*[1]" )];

   // select a collection of nodes - can accept an XPath string or XPath
expression
   boost::xml::xpath::result_set math( root, L"//m:math" );

This would give a cleaner interface between XML and XPath. (NOTE: I have
implemented this style of syntax for my MS-XML wrappers).

> > [5] It should have a clean access to attributes, without the user
>needing to
> > call get/set methods.

>I am not sure what you mean here.

I was thinking from a W3C DOM/MS COM PoV where the attributes are
implemented via get and set methods [Example:
   get_documentElement( IXMLDOMNode * )
vs
   XMLDOMNode XMLDOMDocument.documentElement
], but with clean C++ bindings this is largely irrelevant.

>I am developing Axemill (http://www.sf.net/projects/axemill) to fulfil most
>of
>these goals, with the eventual goal of submitting it to boost. If you want
>to
>contribute code and/or ideas, please email me. Currently, it requires gcc
>3.2
>(though it should build with other relatively conforming compilers) and
>boost
>1.29.0 (I intend to move to 1.30.0 shortly)

I would be happy to help out with code and ideas.

Regards,
Reece

_________________________________________________________________
It's fast, it's easy and it's free. Get MSN Messenger today!
http://www.msn.co.uk/messenger


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk