Boost logo

Boost :

Subject: Re: [boost] JSON Parser GSoC 2013
From: Bjorn Reese (breese_at_[hidden])
Date: 2013-04-11 06:44:44


On 04/10/2013 11:46 PM, Stephan Bourgeois wrote:

> Open Source JSON parsers have already been implemented in C++ and in Java.
> Examples of Java libraries are: Gson, quick-json. Even if other libraries
> already exist, developers who are using Boost for their project will
> appreciate having a JSON parser within Boost.

Agreed.

> The question is what data structure we should use to represent JSON
> objects, and how the user can access key/value pairs in those objects.
> (examples: Boost.PropertyTree, pre-existing C++ object, ...)

I would like to have several different interfaces:

1. Tokenizer API which reads the next token from an input string. This
    is important for streaming data.

2. Iterator API which iterates to the next token in the input string.
    Remembers its parent scopes (unlike tokenizer.) This is similar
    to the XmlTextReader API.

3. Tree API which parses the entire input string into a tree structure.
    This is a bit like the DOM API, and this is what the Spirit example
    and Boost.PropertyTree provides.

4. Serialization API which provides a Boost.Serialization input archive
    without going through an intermediate tree representation.

For each of the above there should be corresponding generation
interfaces.

I have already created the tokenizer and serialization APIs for JSON
(and several other encoding formats) at:

   http://protoc.sourceforge.net/

I have not had the opportunity to look into the iterator and tree APIs
yet, so this may be a good candidate for a GSoC project. As there is no
mentor for the JSON parsing library, I am willing to mentor it if is
based on the protoc code. However, I am only a Boost hang-around, so I
do not know the proper procedures for this.

Unfortunately, the code is currently undocumented, so the best place to
start is the code itself:

   http://sourceforge.net/p/protoc/code/ci/master/tree/include/protoc/json/
   http://sourceforge.net/p/protoc/code/ci/master/tree/src/json/

decoder.hpp contains the tokenizer API.
encoder.hpp contains the token generator API.
iarchive.hpp contains the serialization input archive.
oarchive.hpp contains the serialization output archive.

> Ideally we should offer validating and non-validating implementations. We
> should also offer JSON generation as well as parsing.

Agreed. It is mainly the string validation that is going to be a (minor)
challenge.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk