Boost logo

Boost :

From: Stefan Seefeld (seefeld_at_[hidden])
Date: 2006-10-31 10:47:08


Sebastian Redl wrote:

> There are two types of reader interfaces currently in use that I've
> found. I've come up with a third. I wonder which the people on this list
> would prefer, where they see their weaknesses and strengths. The names
> that I've given them are my own creation.
>
> 1) The Monolithic Interface
> Examples: .Net XMLReader, libxml2 XMLReader (modeled after the .Net
> one), Java Common API for XML Pull Parsing (XmlPull) (don't confuse with
> JSR 173 "StAX")
>
> In the monolithic interface, the XML parser acts as a cursor over the
> event stream. You call next() and it points to the next event in the
> stream. From there, you can query its type (usually some integral
> constants) and call some methods to retrieve the data. All methods are
> always available on the object; calling one that is not appropriate for
> the current event (e.g. getTagName() for a Characters event) returns a
> null value or signals an error.

I don't like the idea of an all-embracing interface that requires the
user to figure out which methods are actually valid for the current type.

> 2) The Inheritance Interface
> Examples: JSR 173 "StAX"
>
> In the inheritance interface, the event types are modeled as a group of
> classes that all inherit from an Event base class. The parser acts as an
> iterator, Java style; calling next() returns a reference/pointer to the
> event object for this event. You use RTTI or a similar mechanism to find
> the type of the event, then cast the reference to the appropriate
> subclass. The subclasses then provide access to the data that is
> actually available for this event type.

While this sounds better (the actual interface only provides what
the actual type supports), it is still the user's responsibility to
figure out the type and do the cast.

> 3) The Variant Interface
> Examples: None. I believe I came up with this entirely on my own.
>
> The variant interface seeks to combine the strengths of the other two
> interfaces. It uses a non-monolithic interface, that is, the parser acts
> like an iterator and the data is not stored within it. It does not
> return a reference to the event object, though, but instead a
> boost::variant of all possible events. This way, heap allocation of the
> event object is avoided, together with all the trouble coming with that.
> The event type can be determined either by calling variant::which, or
> with a variant visitor (type-safe!), or with a special get_base()
> function that works like get() but can retrieve a reference to a common
> base of all the variant types. (This is possible, although an
> implementation does not exist in Boost.)

Same here.
You seem to assume that a single accessor is to be used to retrieve the
current data, whether it is strongly / statically typed or not.

What about an interface similar to SAX, where the user provides a set
of handlers, one per type, and then the reader calls the appropriate
one ? For example:

void handle1(token1 const &);
void handle2(token2 const &);
...

typedef reader<handle1, handle2, ...> my_reader;
my_reader r(filename);
while (r.next()) r.process();

Please disregard the syntax; there are certainly multiple ways to
declare and bind handlers to the reader, either at compile- or at
runtime. My question is merely about whether it would be useful to
use typed callbacks like this.
What are the pros / cons ?

Note that there is room between the two extremes, i.e. a single
token type vs. independent token types: All tokens can be derived
from a common base that provides access to common data, so an
iterator is still possible, for example to 'fast-forward' to
a particular position in the stream.

Regards,
                Stefan

-- 
      ...ich hab' noch einen Koffer in Berlin...

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk