Boost logo

Boost :

From: Jon Radoff (jonradoff_at_[hidden])
Date: 2006-09-08 10:26:59


Could you detail a bit what you mean by 'DOM' and 'DOM-like' here ? We all are probably thinking of
some tree structure that can be navigated, queried, etc..
However, some have already argued that the DOM API as it exists for Java is inappropriate for C++,
or even that the DOM API is already conceptually broken.

Thus, I think it might help if we could detail a bit what it should and should not be, and what
use cases it should support.

I don't know why the Java "Document" interface wouldn't be an appropriate model to work with. It simply defines XML documents as a hierarchy of nodes, individually representing such objects as "elements," content (character) areas, comments, and so forth. The Element objects contain useful methods for inspecting attributes and so forth.

The two most common-use cases for a program interacting with XML are: (a) a need to easily extract data from the document, (b) modify and save an existing XML document.

I guess I'm not acquainted with the arguments that the DOM model is broken. It's a good model for saving data up to fairly large sizes. Random-access or streamed reading (essentially Expat-type methods) are better if all you want to do is extract data. Are critics of DOM looking for some record-based approach that allows them to lock, modify and save parts of an XML document without the need to load an entire document into memory? If so, I suppose I could see some advantages of that, but it also seems like that sort of functionality could be an extension of the DOM concept rather than an entirely new approach.

A good C++ implementation, in my mind, would attempt to utilize a Document class hierarchy, perhaps based directly upon the object hierarchy presented in the Java Document interface. In C++ we can enhance it by using familiar STL containers for a lot of things. For example, Java provides a getElementsByTagName API (gets me a list of all the elements given a particular name). In C++ it would be nice if the equivalent function gave me a hash_multimap<string,Element> so I could work whatever iteration magic I felt like.

If the DOM class inherited from some intermediary random-access class, the latter might provide an interface for those unwilling to load the whole document into memory, yet also save people the trouble of dealing with writing the event handling & function-callbacks that's required in a C-style implementation based on Expat.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk