Boost logo

Boost :

From: Alan Gutierrez (alan-boost_at_[hidden])
Date: 2005-11-06 00:41:41


* Stefan Seefeld <seefeld_at_[hidden]> [2005-11-04 11:39]:
> Anthony Williams wrote:
>
> > It is far easier to write a parser that calls user code (push
> > model) than write a parser that can be continued (pull model),
> > since in the pull model you have to save all the internal state
> > in order to return to the user with each token; you basically
> > have to write a "continuations" mechanism.
>
> Fair enough. But here we are (or should be) focussed on the API,
> i.e. the user. The question is whether to put the parser in
> control of the data flow or the application. While the latter is
> harder to implement it is also far more convenient for users.

    Harder to implement could also imply a complexity that effects
    performance. If the user is consuming a document object model,
    whether that document is build via a push parser or a pull
    parser is moot, and the overhead of maintaining pull parser
    state is nothing but a penalty.

> >>As it happens, the implementation I have in mind uses libxml2, a C
> >>library. As such between the application calling 'parse()' and the
> >>callbacks are two language boundaries (C++ -> C and C -> C++), so
> >>you couldn't even throw exceptions from inside the callbacks and
> >>catch them in the main application.

> > That's one of my main criticisms of your suggested API --- it's
> > too tightly bound to libxml, and doesn't really allow for
> > substitution of another parser.

> Could you substantiate your claim ?

    Sorting out exception handling, though and event framework like
    a push parser framework is no small challenge.
    
    I've always been critical of the Java SAXException, it is
    checked, and it cannot wrap a runtime expcetion, two choices
    that maximize the chanllenges of tunneling exceptions.

> > My other criticism so far is the node::type() function. I really
> > don't believe in such type tags; we should be using virtual
> > function dispatch instead, using the Visitor pattern. Your
> > traversal example could then ditch the traverse(node_ptr)
> > overload, and instead be called with
> > document->root.visit(traversal)
>
> Node types aren't (runtime-) polymorphic right now, but is that
> really a big deal ?

> Polymorphism is important for extensibility. However here the set
> of node types is well known (and rather limited).

    What about a Post-Schema Valiation Infoset PSVI? With XMLSchema
    the types of nodes are unlimited.

--
Alan Gutierrez - alan_at_[hidden] - http://engrm.com/blogometer/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk