Boost logo

Boost :

From: Bjorn Reese (breese_at_[hidden])
Date: 2019-09-23 15:58:26


On 9/23/19 5:16 PM, Phil Endecott via Boost wrote:

> I am reminded of the various discussions of alternative styles of
> XML parsers that have happened on this list over the years.  People
> have a surprising variety of often-conflicting requirements or
> preferences.  I think it's unlikely that any one solution will suit
> everyone - but maybe there are common bits of functionality that
> can be shared?

As a former developer of one of said XML parsers, we learned the
proper abstractions the hard way. If you start with a pull parser (what
Vinnie refers to as an online parser, and what you refer to as
an iterating parser), such as the XmlTextReader, then all the other
interfaces flows naturally from that.

Although the pull parser is mainly used as the basic building block
for the other abstractions, it can also be used directly, e.g. for quick
scanning of large JSON documents without memory allocation.

A push parser (SAX) can easily be created by calling the pull parser
in a loop and firing off events.

Serialization is done by incrementally using a pull parser inside a
serialization input archive, and likewise a a similar interface for
generating the layout (e.g. XmlTextWriter) can be used for output
archives.

A tree parser (DOM) is simply a push parser that generates nodes as
events are fired off.

That is the design principles behind this JSON parser:

   http://breese.github.io/trial/protocol/

> My preference has always been for parsing by memory-mapping the entire
> file, or equivalently reading the entire document into memory as a blob
> of text, and then providing iterators that advance through the text
> looking for the next element, attribute, character etc.  I think one of
> the first XML parsers to work this way was RapidXML.  Their aim was to

The Microsoft XML parser came first.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk