Boost logo

Boost :

From: loufoque (mathias.gaunard_at_[hidden])
Date: 2006-09-07 11:04:30


Sebastian Redl wrote :

> 1) API TYPE
> Pull-API (StAX), Push-API (SAX), Object-Model-API (DOM)?
>
> - All of them, of course! The main question is, which one is the base API?
> - DOM is out of the question (performance/memory overhead).
> - Implementing a push parser on top of a pull parser is trivial:
> while(fetchEvent()) pushEvent()
> - Implementing a pull parser on top of a push parser requires at least
> generator-style coroutines. This occurs a performance overhead at best,
> unusability at worst (in limited environments).
> - It is therefore best to use a pull model at the lowest model, although this
> makes the parser implementation more complex.
>

It could also be possible to make the push and pull parsers more or less
independant, so that each one can be as efficient as it can be.

> 3) Input/Output System
> How does the library access underlying storage?

> - Since it needs to access resources from various sources, typically
> specified
> as URLs, it needs a flexible and runtime-switchable input system.
> - In particular, it should be possible to plug schema resolvers in at
> runtime,
> so that program extensions can provide support for, say, the ftp: schema.

That would be the work of another library, that would provide a way to
read any kind of resource from an URL, a bit like what PHP has.
That kind of library would be very useful too outside of the XML library.

> - Two basic options:
> - Iterator-based approach.
> - Stream-based approach.
> - Other?

Maybe a more low-level approach like what boost asio provides could be
interesting, especially since this models also provides asynchronous I/O.

> 4) Integration With Other Boost Libraries
> What other Boost libraries should Xml work/integrate with?

Since XML needs good Unicode support and the like, maybe there is work
to be done in that area first in boost.

>
> - For example, does it make sense to provide an interface to the parser
> that can
> be used for parsing streaming content? Either non-blocking, with the option
> to parse partial data and hop back on missing content, or a completely
> asynchronous implementation that dispatches SAX events through e.g. ASIO?

The ability to parse partial content would be a great plus.

> 5) Parser Back-End / Library Organization
>
> - Should Boost.Xml be a complete XML solution, with a parser, DOM
> implementation
> and everything?

Writing a complete XML solution is a lot of work, especially if you want
to support all XML technologies (XMLSchema, RelaxNG, XPath, XLink,
XInclude, XPointer...)
Maybe it could be interesting to reuse libxml2, which is under the MIT
license, to build something on top of it. Of course first we need to
weight the gains behind a new C++ implementation.

> - Or should it be split into two parts, one being a parser, the other a DOM
> implementation with various construction modes?
> - Or should even the core parser be split into the actual text parser and the
> event/pull/whatever interface, so that an HTML or YAML or PYX parser or even
> an algorithmic content generator can be placed behind?
> - What, then, is the interface between that parser and the user interface?
>
> 6) Other Issues ???
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk