Boost logo

Boost :

From: Graham Bennett (graham-boost_at_[hidden])
Date: 2005-11-09 20:14:11


Hi Doug,

On Tue, Nov 08, 2005 at 09:28:09PM -0500, Douglas Gregor wrote:
>
> On Nov 8, 2005, at 7:44 PM, Graham Bennett wrote:
> > IMO a streaming interface is much more important than DOM as a
> > starting point - one can easily and efficiently build a DOM from a
> > stream, but starting with an in-memory representation of a document
> > usually precludes streaming. There are a number of XML applications
> > where it is not desirable or possible to hold the entire document
> > in memory at once. A reader interface has advantages over SAX in
> > that it is much easier to program with. It's very easy to do
> > things like implement decorators around readers, and to write
> > generic code that just understands how to use a reader and doesn't
> > care how the XML is actually stored.
>
> Readers are important for some things, DOM is important for other
> things, but there's no reason to tie the two together in one library
> or predicate one on the other.

Well, there is at least one reason - if the DOM is built on top of a
reader interface then the DOM library doesn't have to know how to parse
XML, and is not tied to any particular parser. Even if you don't agree
with using a reader interface for this separation layer, I'd hope you
would agree that some separation is at least necessary.

> We can have a XML DOM library that allows reading, traversing,
> modifying, and writing XML documents, then later turn the reading
> part into a full-fledged streaming interface for those applications.

Can you elaborate on how you would enable a DOM structure to present a
streaming interface? Are you talking about lazy tree building or
something else? In any case, I would think it's inherantly difficult to
retrofit a streaming interface. Much better to build the streaming
interface from the start, and build the DOM on top of it. This can only
be good for both sides - the reader gets to just be a reader, and the
DOM gets to just be a DOM.

> > That's not to say I don't think a Boost DOM implementation is a good
> > idea. One thing I would like to see from such an implementation is
> > for it to be policy based, since there are many different use cases
> > for a DOM library. For example some scenarios might only need a
> > read-only tree, which means optimisations can be made in how the
> > nodes are stored. Others might call for efficient access to child
> > elements of a node (e.g. by index) for query, such as when XPath
> > is used. If this kind of thing could be extracted into policies I
> > think it would differentiate such a library from the others that
> > exist already.
>
> [Standard anti-policy rant]

Ah, so you have an anti-policy policy :o)

> Policies should be used very, very carefully. They introduce a huge
> amount of mental overhead, are very hard to combine sensibly, and
> create very fragile implementations.

I don't think the things you list here are properties of all
policy-based implementations, but agreed they are potential pitfalls to
be avoided.

The reason I was suggesting using policies for how a DOM is created is
simply from experience of working on C++ DOM libraries myself. I don't
think it's possible to make a one size fits all library and, unless one
wants to create multiple different libraries for different use cases,
policies seem like the way to go. But agreed much care would have to be
taken.

> > An XPath implementation should be completely separated from the XML
> > representation, since it's effectively just an algorithm that can be
> > applied to anything that has the correct data model and iterator
> > interface.
>
> This is probably the case. However, one can think of places where a
> tighter integration might give a more natural interface, e.g.,
>
> xml::element_ptr books = ...; xml::node_set cheap_books =
> books[attr("price") < 30];
>
> But, like the reader interface, a library that supports something DOM-
> like can be augmented with XPath support.

I'm not convinced something separate wouldn't be better and more widely
useful.

Thanks,

Graham

-- 
Graham Bennett

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk