Boost logo

Boost :

From: loufoque (mathias.gaunard_at_[hidden])
Date: 2006-10-29 18:36:32


Sebastian Redl wrote:

> Once again I'm turning to the list for discussion about a design issue
> in the XML library. This time I hope to avoid any discussion about the
> implementation on the library and focus on interface only.

Have you thought about asynchronous parsing?
How could that be available?

>
> The interface in question is the reader interface, also known as pull
> interface. Like SAX, the pull interface is an event-based interface.
> There are a few event types (roughly, StartElement, EndElement,
> Characters, and a few more for other XML features), all of which provide
> come with some additional data: the element name, the character data, etc.
>
> There are two types of reader interfaces currently in use that I've
> found. I've come up with a third. I wonder which the people on this list
> would prefer, where they see their weaknesses and strengths. The names
> that I've given them are my own creation.

There are of course variations, like the one Matt Gruenke revealed.
You could provide the inheritance interface but with the objects
actually owned by the parser (making it kind of like the monolithic
interface), and use variant to store those objects on the stack.

This idea doesn't look so bad actually, since you have the second
solution without its drawbacks and that you only gain the advantages of
the first solution (if you provide the appropriate tools to allow copy
construction of the referenced objects, that is).

I don't understand, though, if you mean that the parser containing its
state is a good thing or not.

Anyway, whatever is chosen, I think using variant with the ability to
get a base will be a good idea somewhere.
This provides both type-safe `which' and visitors and RTTI for those we
want it.

Examples of how some basic operations could be done with those
interfaces would come in handy to compare them for the ones, like me,
that don't have much experience with parsing XML.

> Independently of the type of interface chosen, another issue is
> important: the scope of the interface. Should it report all XML events,
> including those coming from DTD parsing?

Validation is quite costly: a way to prevent it would be nice. And it's
not just DTD, there are other validation means.

However, without validation you don't know what the `id' attribute is,
which is quite annoying. It seems that's why they introduced xml:id.
Browser engines like Gecko don't validate but they know what the id
attributes are for each namespace that they handle. Maybe something
similar could be done, be it with static data or user input.

> Should this be a user choice,

Don't validate by default, and do it if the user asks for it.
It seems like the better choice to me.

> Should errors be reported as error events, or as
> exceptions?

We expect errors to happen, so we shouldn't use exceptions.
We could allow them to be toggled on though, for users that don't want
to check for such things and are not looking for super efficiency. Maybe
they should be using a higher level API then though.

> How about warnings:
> exceptions are inappropriate for them.
> Should it be possible to disable
> them completely?

In exception mode, it should be allowed to ignore warnings, and maybe be
the default.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk