Boost logo

Boost :

From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2005-11-04 10:42:01


Stefan Seefeld <seefeld_at_[hidden]> writes:

> Jez Higgins wrote:
>
>>>A better API that still follows the cursor-style approach from SAX,
>>>is the XMLReader. It uses a pull model instead of push, i.e. there
>>>are no callbacks, but instead the application advances the reader's
>>>internal cursor to the next 'token'.
>>>See http://xmlsoft.org/xmlreader.html for a comparison to SAX.
>>
>>
>> For some definition of better. The unpleasantness with pull APIs is the
>> token - you have to interrogate it for its actual type, and then
>> dispatch.
>
> Granted. But the underlaying parser which any SAX implementation would
> build on would have to do that, too. You can think of the reader as
> that lower layer, and thus a push API with type-safe dispatching
> can easily be built on top, if that is what you want.
>
> Of course, the other direction is possible, too. However, logistically
> it is easier to put the push layer over the pull layer, i.e. the SAX
> implementation on top of the reader:

Surely it depends on which parser you use. My XML-parser-in-progress
(sourceforge.net/projects/axemill) uses a callback mechanism akin to SAX; at
the moment, that's all there is, as I haven't written a DOM yet.

It is far easier to write a parser that calls user code (push model) than
write a parser that can be continued (pull model), since in the pull model you
have to save all the internal state in order to return to the user with each
token; you basically have to write a "continuations" mechanism.

> As it happens, the implementation I have in mind uses libxml2, a C
> library. As such between the application calling 'parse()' and the
> callbacks are two language boundaries (C++ -> C and C -> C++), so
> you couldn't even throw exceptions from inside the callbacks and
> catch them in the main application.

That's one of my main criticisms of your suggested API --- it's too tightly
bound to libxml, and doesn't really allow for substitution of another parser.

My other criticism so far is the node::type() function. I really don't believe
in such type tags; we should be using virtual function dispatch instead, using
the Visitor pattern. Your traversal example could then ditch the
traverse(node_ptr) overload, and instead be called with
document->root.visit(traversal)

> If, on the other hand, the callback dispatcher itself was written in
> C++, no language boundaries would need to be crossed while unwinding
> the callback stack.

Yes. Axemill would allow that, for example.

Anthony

-- 
Anthony Williams
Software Developer
Just Software Solutions Ltd
http://www.justsoftwaresolutions.co.uk

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk