Boost logo

Boost :

From: Reece Dunn (msclrhd_at_[hidden])
Date: 2003-06-02 08:42:49


Stefan Seefeld wrote:

>What I originally suggested was not a parser, but a set of APIs to
>manipulate XML. The parser part (i.e. the piece of code that generates
>a parse tree from an XML file) is the simples part of it all. What
>is much more tricky is to get the right internal structure to make
>operation on the tree efficient and convenient.

>That said, I would *not* recommend to rewrite any such thing. It is
>a *lot* of work, and as such quite unrelated to boost's goals.

Would also mapping an implementations structure to a C++ internal structure
also require quite a bit of work?

OPTION 1: C++ specific internal mappings.

boost::xml::dom::document doc( "demo.xml" );

class document
{
   private:
      boost::xml::dom::element root;
   public:
      document( const char * fn )
      {
         impl::XMLDOMDocument doc( fn );
         BuildXMLDOM( doc.documentElement, root );
      }
   private:
      void BuildXMLDOM( impl::XMLDOMElement, boost::xml::dom::element & );
};]

Where BuildXMLDOM recursivley builds the internal XML tree structure.

NOTE: This is necessary if you want to use an internal C++ representation to
efficiently model the structure for C++ bindings, e.g. using trinary search
trees or other associative container for attribute storage.

This would make the loading of an XML document more computationally and
memory intensive because you have to load it twice (one by the parser and
one by the C++ bindings). There are problems in this regard when loading
large documents (effectively having double the memory capacity). Also, what
about SAX facilities?

OPTION 2: If you are intending to wrap an implementation like libxml2 into a
C++ interface, you would sacrifice how the data is represented internally
and you would get a slight performance penalty from the wrappers (not so
much if you use inlined functions). This approach would not suffer the
loading penalties described above.

OPTION 3: Writing a boost XML/XPath parser would allow the internal
structure to be optimised for C++-specific bindings, while not suffering
from either wrapper performance penalties nor document loading/SAX parsing
penalties.

>What I had (and still have) in mind is a C++ interface to an existing
>implementation (libxml2 actually).

What if the user wants an interface to another implementation? Is it
possible to standardize access to other parsers.

NOTE: If you are using the option 1 approach, the variations would occur in
boost::xml::dom::document - specifically the constructor and the semantics
for BuildXMLDOM.

>Also I'm not convinced that the main goal should be to conform with
>the DOM specs as provided by w3c. Lots of implementers / users consider its
>design broken. Instead I'd suggest to try to come up with a 'good' C++ API,
>and then build a wrapper around it that provides the legacy
>mapping as needed.

This is the thinking that I have moved towards, more details of which can be
found in my last post on the subject.

NOTE: Here and in my previous post, I use DOM to refer to the C++ Document
Object Model binding, and not the W3C DOM standard. when I refer to that, I
use the W3C to specify what type of DOM it is.

Regards,
Reece

_________________________________________________________________
Get Hotmail on your mobile phone http://www.msn.co.uk/msnmobile


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk