|
Boost : |
From: Alan Gutierrez (alan-boost_at_[hidden])
Date: 2005-11-01 22:58:55
* Stefan Seefeld <seefeld_at_[hidden]> [2005-11-01 10:18]:
> Alan,
>
> thank you for your interesting points. The API I suggest is not
> modeled after the the W3C DOM IDL, neither its java implementation.
>
> Many people have expressed discomfort both with the W3C DOM API
> as well as the idea of simply transcribing the java API to C++.
>
> Therefor, the API I suggest here is (so I hope) as C++-like as
> it can be, while still giving full flexibility to operate on
> (i.e. inspect as well as modify) XML documents.
>
> From the little I could gather about the alternatives you mention,
> it sounds like they would make very nice access layers on top of
> the base API (axis-oriented iterators, say).
>
> > I'd suggest, in any language wide implementation of XML, to
> > attempt to separate transformation and query, from update. They
> > are two very different applications.
>
> I'm not sure I understand what you mean by transformation. How
> is it different from update ? Or is the former simply a (coarse-grained)
> special case of the latter, using a particular language to express
> the mapping (such as xslt) ?
Transformation engines are XQuery, XSLT, STX, and Groovy GPath.
They do not update the document provided. The produce a new
document. That is what I mean by transformation. The input XML
document is not changed, it is read, and a new document is emitted.
The document object model does not need to be mutable. Thus you
can perform all sorts of optimizations for navigation.
The ability to add or remove a node makes a document object
model far more complex.
Many people prefer this mode of operation over adding and
removing nodes.
Node insert/remove appears to be a common operation, because of
web programming, where chaning the dom in the browser changes
the display of the page.
When you are not programminng for the pretty side-effects, node
surgery becomes a real pain. Reading the document in, shuffling
nodes, writing it back out is cumbersome. A lot of code is spent
on the add and remove that is repetitious.
It's much easier to express an XML operation in terms of a
query that returns a document, or as a reactor to a set of
events.
> > I'd suggest starting with supporting XML documents that conform
> > to the XPath and Query data model, and working backwards as the
> > need arises. It makes for a much more consice library, and
> > removes a lot of methods for rarely needed, often pathalogical,
> > mutations.
>
> There are clearly very different use cases to be considered. We should
> collect them and try to make sure that all of them can be expressed
> in a concise way. I'm not sure all of them operate on the same API layer.
I'm sure they could, but I'm sure it would make a heavyier API
than necessary.
XSLT, XQuery, and XPath simply do not require "removeChild".
> The code I posted supports xpath queries. While the result of an xpath
> query can have different types, right now only node-sets are supported
Which is cool, since in XPath an atomic value is the same thing
as a node set that contains only that atomic value.
> (May be boost::variant would be good to describe all of the possible types).
Types are described by a qualified name in XPath. Someone who is
implementing a host language for XPath, like XQuery or XSLT,
will require a named type.
> I'm not quite sure I understand what you mean by 'XPath data model'.
http://www.w3.org/TR/xpath-datamodel/
> > Implementing an object model would be much easier, if you
> > ipmlement the 95% that is most frequently used. And if you
> > sepearate the compexity of document mutation from the realative
> > simplicity of iteration and transformation.
> Could you show an example of both, what you consider (overly) complex
> as well as simple ? While the API in my code is certainly not complete
> (namespaces are missing, notably), I find it quite simple and intuitive.
> I don't think it needs to become more much complex to be complete.
You are right on the money with W3C DOM. That is an overly
complex object model.
It allows for the creation of documents that do not adhere to
XML Namespaces. If it were up to me, I'd create an document
object model that was an XML Namespaces document object model,
instead of an XML document object model.
W3C DOM is designed to accept <a:b:c/> as a valid element name.
For a good example of production code, I'd look at Saxon's
NodeInfo object. The code is wooly, but describes the subset of
data used in XPath, XQuery and XSLT, and the implementation gotchas.
It really is an implementation of XPath data model, and probably
the best example of how to implement it that is open source.
> In particular, I'm hoping that we can make the API modular, so document
> access and document validation are kept separate (for example). May be that
> is what you mean, I'm not sure.
Yes. There are different breakdowns. Validation is something
that people will want to do without, for the sake of
performance.
An XML document can be very useful read-only. I find that in my
work, I'm don't have call to update nodes, since the XML comes
from Atom feeds or SQL databases, replacing nodes makes little sense.
http://www.w3.org/TR/xml-infoset/
I'd start by modeling the information, then move on to a
separeate interface for mutating it.
I'd put axis high on the list, since that is how XML has come to
be seen by many, and they are a natural for the C++ STL.
That strikes me as the best way to work with XML in C++, using
C++ SQL algorithms as a query language, navigating a very
efficent XML document object model, emitting a new document.
Cheers.
-- Alan Gutierrez - alan_at_[hidden] - http://engrm.com/blogometer/ http://www.w3.org/TR/xml-infoset/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk