Boost logo

Boost :

From: dietmar_kuehl_at_[hidden]
Date: 2001-10-05 07:53:31


Hi,
--- In boost_at_y..., Matthias Troyer <troyer_at_i...> wrote:
> The discussion about a boost XML parser seems to have stopped
> without any plans for the future (or have I missed something).

My understanding is that someone wanted to post a SAX-like interface
as a proposal and I want to post an iterator interface to XML
parsing. I definitely outlined an interface (not with concrete code,
however, but I have uploaded a first very rough draft of my work:
just look at the xml directory in the files section. Note: Although
I'm thinking of maintaining basically this interface, both the
interface and in particular the implementation is subject to change.
Actually, the implementation is nothing more than a prototype which
can be used to demonstrate how things are supposed to work but it
is a lot work to be done to get a conforming XML parser (however,
this is what I'm currently working on).

> - a light-weight parser, with a forward-iterator style interface
> that can be used to traverse a document once (e.g. to extract
> information),

I don't think that a forward iterator is really what is necessary or
even desirable: The necessary state is too big and I don't see any
reason to do a multi pass scan with the need to recover positions. I
started out with an input iterator interface but even this is pretty
inconvenient for this task and probably unnecessary inefficient. My
current interface is more like Java enumerators where you can ask
whether there are more elements and get the next element, however,
with the added possibility to get the current element, too. This
can easily be wrapped into an input iterator using a [shared]
pointer to the parser object internally.

> Please keep us informed about any projects - we are willing to help
> with development and testing.

Help with testing sounds good :-) There is a testsuite for XML
available (<http://www.oasis-open.org/committees/xml-conformance/>)
but it doesn't come with a testdriver. This is one thing which should
be useful, independant on which concrete XML parser implementation
is effectively selected. In addition to this conformance test I
think a performance test comparing different XML parsers would also
be helpful.

Other than this, the only thing which would help me with my
development is a code conversion facet which determines the encoding
(at least UTF8 and UTF16) depending on the first few bytes and then
uses this encoding. The information which encoding is used would
be stored in the 'std::mbstate_t' object passed to the various
functions. There are some code conversion things in the files
section (two code conversion facets and a code conversion iterator)
but neither of these can decide which encoding is used and just do
the right things. This seems to be necessary for proper XML parsing.

It doesn't make much sense for me to post the current state of the
parser (since it is currently basically non-functional) but I hope
to have something close to a conforming non-validating parser ready
at the end of the weekend (currently I'm struggling with entity
references and once I got past this one, proper well-formedness
checks are next).

--
<mailto:dietmar_kuehl_at_[hidden]> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk