Boost logo

Boost :

From: Anthony Williams (anthony_w.geo_at_[hidden])
Date: 2005-11-07 05:20:09

Stefan Seefeld <seefeld_at_[hidden]> writes:

> Anthony Williams wrote:
>> It is far easier to write a parser that calls user code (push model) than
>> write a parser that can be continued (pull model), since in the pull model you
>> have to save all the internal state in order to return to the user with each
>> token; you basically have to write a "continuations" mechanism.
> Fair enough. But here we are (or should be) focussed on the API, i.e. the
> user. The question is whether to put the parser in control of the data
> flow or the application. While the latter is harder to implement it is also
> far more convenient for users.

Convenience for users depends on their application. As you say in another
mail, this is a moot point for now, since we're discussing the API for the
parsed DOM, not the API for parsing.

>>>As it happens, the implementation I have in mind uses libxml2, a C
>>>library. As such between the application calling 'parse()' and the
>>>callbacks are two language boundaries (C++ -> C and C -> C++), so
>>>you couldn't even throw exceptions from inside the callbacks and
>>>catch them in the main application.
>> That's one of my main criticisms of your suggested API --- it's too tightly
>> bound to libxml, and doesn't really allow for substitution of another parser.
> Could you substantiate your claim ?

Really, it was just a feeling from looking at the API docs. However:

In order to use a particular external type, such as std::string, the user has
to supply a specialization of converter<> for their type, which converts to
and from the libxml xmlChar type.

Also, there are lots of constructors around that take xmlNode pointers, or
xmlDoc pointers, or similar. It may be that these are intended as private
implementation details, but they show up in the documentation.

>> My other criticism so far is the node::type() function. I really don't believe
>> in such type tags; we should be using virtual function dispatch instead, using
>> the Visitor pattern. Your traversal example could then ditch the
>> traverse(node_ptr) overload, and instead be called with
>> document->root.visit(traversal)
> Node types aren't (runtime-) polymorphic right now, but is that really a big deal ?
> Polymorphism is important for extensibility. However here the set of node types
> is well known (and rather limited).

Polymorphism is not just important for extensibility of the polymorphic set,
but also for type-safe, convenient handling of objects whose exact type is
only known at runtime. Check-type-flag-and-cast is a nasty smell that I would
rather avoid where possible.

> Making nodes polymorphic would imply that the library allocates nodes on the heap,
> instead of the stack (as it now does). That could well hurt performance. I'm not
> sure how much of an issue that is, though.

Since the types of nodes is well-known and limited, you could use
boost::variant to allow stack-based storage, whilst maintaining the
polymorphic behaviour.

One additional comment on re-reading the samples --- having to instantiate
every template for the external string type seems rather awkward.

One alternative is to accept and return an internal string type, and provide
conversion functions to/from the user's external string type. This way, the
library is not dependent on the string type, but it does add complexity to the

Another alternative is to make the functions that accept or return the user's
string type into templates, whilst leaving the enclosing class as a
non-template, since there are no data members of the user's string
type. Template parameter type deduction can be used to determine the type when
it is given as a parameter, and explicit specification can be used when it is
needed for a return type.


Anthony Williams
Software Developer
Just Software Solutions Ltd

Boost list run by bdawes at, gregod at, cpdaniel at, john at