Boost logo

Boost :

From: Aleksey Gurtovoy (alexy_at_[hidden])
Date: 2001-09-27 19:53:34


Dietmar Kuehl wrote:
> Since OMG's C++ mapping for IDL is, IMO, suboptimal and we don't
> really want to bind the parser to CORBA anyway, I'd suggest at
> least taking liberal freedom when doing the mapping. In general, I
> would realize something which is "event driven" although with
> reversed control: This is what we call an "iterator" in C++. The
> SAX approach is a "pushing" approach: You start the thing and it
> bombs you with events until it is done. Iterators use a "pulling"
> approach. You get handles for the sequence and when you feel like
> it, you obtain the next value.

Funny, "iterators" interface is exactly what we've done here at work.

> Implementing a pushing interface on
> top of a pulling interface is, obviously, trivial. The other way
> around is basically impossible (unless you store the data you get
> pushed in a sequence and move over this one).

That's what we did (the interface is just a wrapper on top of a SAX parser).
Obviously, that's very ineffective, but it was a prototype implementation
anyway. Unfortunately, it remained in prototype stage and never got into
production usage; simple as the idea might seem, it requires a lot of work,
insight and experience to get it into "usable to an application programmer"
state; we would _love_ to see (and participate) in a cooperative boost
effort to implement something along the lines below. FWIW, the only reason
we don't use XML in our current projects is the lack of proper modern C++
parser interface implemented (and the lack of time and experience to
implement one ourselves (alone)).

>
> My approach to an XML parser is to use a tokenizer to chop the
> XML sequence into digestable parts. On top if this tokenizer,
> another iterator verifying well-formedness is sitting. Optionally,
> yet another iterator checking validity is used where both of these
> iterators implement the same concept (something like "XML object
> iterator"; an XML object can be an entity, its attributes, contents,
> etc. and the iterator will tell what it is currently sitting on
> using an appropriate accessor). Implementing either a SAX or a DOM
> interface using such an iterator is pretty simple. The only question
> is whether the iterator level is too low to do validation without
> accomodating data which is duplicated in higher level interfaces
> and thus results in unnecessary overheads.
>
> ... and why do I want to reinvent the wheel rather than using
> Xerces, which is working after all, in the first place? Well, I was
> successful in using it but personally I consider it a pain. It uses
> idioms imported from some alien world which don't work too well in
> this alien world and work even less in C++. I used it quite a while
> ago but if I remember correctly, there were classes for lots of
> different things which aren't handled different at all. The overall
> interface to do simple things was, IMO, too complex: I want a simple
> interface to do simple things. This saves me the complex interface
> for complex things.
>
> Since others stated that they will shift priority to do the XML
> stuff soon, this is what I'm doing, too: I hope to get at least a
> rough cut of my XML parser flying on the weekend which is in a form
> suitable for a broader audience.

That would be wonderful, Dietmar!

Aleksey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk