Boost :

Date view	Thread view	Subject view	Author view

From: Daryle Walker (darylew_at_[hidden])
Date: 2001-09-27 18:12:39

Next message: Fernando Cacciola: "RE: [boost] Integers and other numbers [was Re: dlw_int review]"
Previous message: Daryle Walker: "Re: Lexical_cast (const ref or value)"
Maybe in reply to: dietmar_kuehl_at_[hidden]: "XML interface: SAX? DOM? something else? ... and what about writing?"
Next in thread: Dan Nuffer: "Re: [boost] XML interface: SAX? DOM? something else? ... and what about writing?"

on 9/27/01 8:09 AM, dietmar_kuehl_at_[hidden] wrote:

> yesterday we had some discussion concerning XML parser interfaces
> for the Boost library. It was suggested to stick to the SAX and
> DOM "standards". Although I have an idea how SAX parser look like
> in general, I haven't found any document specifying SAX as a
> standard! To my understanding, SAX and DOM are basically interfaces
> specified using OMG IDL and supposed to be releaized according to
> corresponding language mappings.

I don't think SAX is a standard; it's just an interface designed by someone
that a lot of other people liked. I think DOM has some people at W3C
working on it.

> Since OMG's C++ mapping for IDL is, IMO, suboptimal and we don't
> really want to bind the parser to CORBA anyway, I'd suggest at
> least taking liberal freedom when doing the mapping. In general, I
> would realize something which is "event driven" although with
> reversed control: This is what we call an "iterator" in C++. The
> SAX approach is a "pushing" approach: You start the thing and it
> bombs you with events until it is done. Iterators use a "pulling"
> approach. You get handles for the sequence and when you feel like
> it, you obtain the next value. Implementing a pushing interface on
> top of a pulling interface is, obviously, trivial. The other way
> around is basically impossible (unless you store the data you get
> pushed in a sequence and move over this one).

The pulling model does sound better than a push. I think a generator is a
better description than an (input) iterator.

> My approach to an XML parser is to use a tokenizer to chop the
> XML sequence into digestable parts. On top if this tokenizer,
> another iterator verifying well-formedness is sitting. Optionally,
> yet another iterator checking validity is used where both of these
> iterators implement the same concept (something like "XML object
> iterator"; an XML object can be an entity, its attributes, contents,
> etc. and the iterator will tell what it is currently sitting on
> using an appropriate accessor). Implementing either a SAX or a DOM
> interface using such an iterator is pretty simple. The only question
> is whether the iterator level is too low to do validation without
> accomodating data which is duplicated in higher level interfaces
> and thus results in unnecessary overheads.
>
> ... and why do I want to reinvent the wheel rather than using
> Xerces, which is working after all, in the first place? Well, I was
> successful in using it but personally I consider it a pain. It uses
> idioms imported from some alien world which don't work too well in
> this alien world and work even less in C++. I used it quite a while
> ago but if I remember correctly, there were classes for lots of
> different things which aren't handled different at all. The overall
> interface to do simple things was, IMO, too complex: I want a simple
> interface to do simple things. This saves me the complex interface
> for complex things.
>
> Since others stated that they will shift priority to do the XML
> stuff soon, this is what I'm doing, too: I hope to get at least a
> rough cut of my XML parser flying on the weekend which is in a form
> suitable for a broader audience.
>
> Somebody mentioned the possible desire of an XML writer. This is
> something I would realize on top of an interface/concept used to
> traverse trees (or even graphs: these can always be seen in form
> of trees by defining a start node and a traversal rule): DOM is just
> a tree and writing it, is traversing this tree with action on the
> nodes. In general, writing an XML document is just writing a certain
> tree structure which is, however, not necessarily given as a DOM
> tree. BTW, if there is an efficient approach to tree traversals,
> this could very well become the foundation of an XPath component:
> XPath is nothing else then a form of regular expressions over a
> [slightly generalized (*)] tree structure and XPath has, IMO,
> application beyond XML! However, the tree traversal needed for XPath
> has additional requirements over the tree traversal needed to write
> an XML file: For XPath you need the possibility to go "up" from an
> arbitrary node while an XML writer only needs to go "down" or up to
> a node sitting on the path from the root to the current node,
> something which is conveniently handled by a stack.
>
> (*) The generalization over typical tree structures is the
> attributes axis.

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com

Next message: Fernando Cacciola: "RE: [boost] Integers and other numbers [was Re: dlw_int review]"
Previous message: Daryle Walker: "Re: Lexical_cast (const ref or value)"
Maybe in reply to: dietmar_kuehl_at_[hidden]: "XML interface: SAX? DOM? something else? ... and what about writing?"
Next in thread: Dan Nuffer: "Re: [boost] XML interface: SAX? DOM? something else? ... and what about writing?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk