Boost logo

Boost :

Subject: Re: [boost] Potential Boost SAX library
From: Oliver Adams (owacoder_at_[hidden])
Date: 2018-01-13 13:00:24


> There are two kinds of incremental parsers: push parsers (SAX) and pull
> parsers (approximately StAX.) Briefly put, push parsers traverses the
> input automatically and generates events for each token it finds,
> whereas pull parsers traverses the input manually like an iterator
> and the current token can be queried.

My library is kind of a push-pull framework. You can request the parser to
parse one event (one event is considered the smallest parse the input
format is capable of) and the parser then pushes the result to the output
handler as one or more writes. Trouble is, where the parser stops parsing
is format-dependent. This kind of limits the pull framework to just
"event-loop" style parsing right now.

> Pull parsers have some significant advantages over push parser:

> * It is straight-forward to implement a push parser on top of a pull
> parser. This involves a loop and a switch statement (see [1] for a
> complete example.) Going in the other direction involves the use of
> coroutines; most likely stateful coroutines.

Most of these features are not currently available in cppdatalib because
individual tokens are not accessible as a pull parser. If I refactored a
few things, I might be able to get a full pull parser framework.

> * Contextual parsing can be done directly, unlike push parsers where
> you have to maintain contextual state in the event handler.

Right now, contextual parsing is implemented in a base class of the output
handler, so it's still isolated from the end user. Kind of hackish, though,
since the parser queries the output handler for the structure of the data
it's already read.

> * Push parsers can be used directly in Boost.Serialization archives.

> * Pull parsers are composable. For instance, you could insert a URL
> pull parser directly into an HTTP pull parser.

Composability is a big issue with push parsers, so removing obstacles to
that would greatly simplify some things. For certain types of information,
though, it doesn't seem like composition is important.

On Jan 13, 2018 5:05 AM, "Bjorn Reese via Boost" <boost_at_[hidden]>
wrote:

On 01/09/18 18:36, Oliver Adams via Boost wrote:

I was wondering if a library I'm developing would be of value to the Boost
> community. It is basically an event-driven parsing/serialization library
> for common formats using a standard internal representation or simple
> pass-through conversions. Would anyone be interested in something like this
> being added to Boost?
>

There are two kinds of incremental parsers: push parsers (SAX) and pull
parsers (approximately StAX.) Briefly put, push parsers traverses the
input automatically and generates events for each token it finds,
whereas pull parsers traverses the input manually like an iterator
and the current token can be queried.

Pull parsers have some significant advantages over push parser:

  * It is straight-forward to implement a push parser on top of a pull
    parser. This involves a loop and a switch statement (see [1] for a
    complete example.) Going in the other direction involves the use of
    coroutines; most likely stateful coroutines.

  * Contextual parsing can be done directly, unlike push parsers where
    you have to maintain contextual state in the event handler.

  * Push parsers can be used directly in Boost.Serialization archives.

  * Pull parsers are composable. For instance, you could insert a URL
    pull parser directly into an HTTP pull parser.

For a pull parser framework see:

  https://github.com/breese/trial.protocol

The documentation is a bit old though.

[1] http://breese.github.io/trial/protocol/trial_protocol/json/t
utorial/push_parser.html

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman
/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk