Boost logo

Boost :

Subject: Re: [boost] New XML library
From: KSpam (keesling_spam_at_[hidden])
Date: 2008-12-10 18:45:37


Cory,

On Wednesday 10 December 2008 15:13:32 Cory Nelson wrote:
> I have a low level iterator-based parser here:
> http://svn.int64.org/viewvc/int64/xml/
>
> The design I've been taking is something like this:
>
> parser.hpp (xml::parser): the lowest level. Given two UTF-32
> compatible forward iterators, it returns one of (ok, done, need_more,
> error), a node type (element/xmldecl/etc.), and an iterator range.
> This parser performs no allocations, and as such does minimal
> structural checking. It does however have full character validation,
> if you so choose (by a template parameter). Really this does only
> slightly more than a lexer, and is available if you want need top
> performance and don't need full XML compliance and validation.
>
> reader.hpp (xml::reader): the next level. A UTF-32 push parser that
> is fully XML 1.0 and 1.1 compliant, capable of validating the
> document, tracking line/column numbers, entity substitution, and other
> normal things you'd expect from a parser.
>
> document.hpp (xml::document): a full in-memory document. A modifiable
> version, and constant version which uses an arena allocator to stay as
> compact as possible.
>
> As of now, only xml::parser is usable- everything but DTD parsing is
> complete. I have been really busy these past few months and haven't
> got a chance to complete it. The main goal I had when beginning this
> is to have something I/O agnostic, that can drop out when it finds an
> incomplete stream and be resumed later. It was really important that
> it work just as fantastically with parsing from memory, blocking I/O,
> or async I/O.
>
> It should also be very performant, which it is: the parser being very
> lightweight, UTF-8 decoding is actually a huge bottleneck in my tests
> which led me to allow the parser (via template parameter) to work
> directly with UTF-8 if you don't require full compliance.

Thanks for the link! I would love to see something like this added as a Boost
library. It is lightweight, and it looks very useful already (FYI, with a
few minor mods, I ran the test.cpp application on Linux).

I like the idea of having policies to tailor the fidelity of the parser. It's
nice to not have to pay for what you don't need.

It seems that your parser could be put up for review. If it is accepted,
other layers could be added over time. I would really like to see a
data-binding layer complete with schema validation down the road.

Thanks,
Justin


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk