Boost logo

Boost :

Subject: Re: [boost] Interest in a Boost.JSON library?
From: Kirit Sælensminde (kirit.saelensminde_at_[hidden])
Date: 2008-12-21 22:21:29


OvermindDL1 wrote:
> On Sun, Dec 21, 2008 at 3:49 AM, Kirit Sælensminde
> <kirit.saelensminde_at_[hidden]> wrote:
> Interesting timing. If you have been watching the spirit lists, a few
> days ago I created a JSON parser in Spirit2x to both test for the vast
> speed enhancements of Spirit2x and to donate as an example. Mine is
> templated on the string and allows you to override the character class
> parsing with any Spirit supported type (which does not include Unicode
> yet, but will include it later, we have been talking about it, but
> does currently support wide_chars).

I'm not even sure that I knew there was a Spirit list. Do you have a
link to yours?

One complication with JSON is that although the transmission can be
UTF-8, UTF-16 or UTF-32, the Unicode escaping in the strings is always
UTF-16, which is why I build into a UTF-16 buffer and then convert it
from there. My string class uses UTF-16 on Windows and UTF-8 on Linux.

So far nearly all of the JSON that I've seen in the wild in practice
uses ASCII with everything above 0x7f escaped using the \uXXXX notation.
This is what my unparser does too.

> One question, you have int64_t as a supported type, but from my
> research the Number type in the current JSON spec is a 52/12-bit
> floating-point type, double in other words.

I don't think that the JSON specification determines what the
representation should be for any of the stored values, for example,
there is no limit on the number of digits that make up the integer part
of a number. It is true that JavaScript has only a floating point type
though with all integer operations being emulated.

Here is the number definition from RFC4627:

          number = [ minus ] int [ frac ] [ exp ]
          decimal-point = %x2E ; .
          digit1-9 = %x31-39 ; 1-9
          e = %x65 / %x45 ; e E
          exp = e [ minus / plus ] 1*DIGIT
          frac = decimal-point 1*DIGIT
          int = zero / ( digit1-9 *DIGIT )
          minus = %x2D ; -
          plus = %x2B ; +
          zero = %x30 ; 0

> Mine is basic, a single
> header file, and it stuffs it all into a Value type, which is a
> Boost.Variant of a null_type, false_type, true_type (empty structs I
> made, those are specified as types in the JSON standard, not bool's
> for the true/false), double, StringType (whatever the templated String
> type is), Object (an boost::unordered_map since the JSON standard
> stats that it is an unordered map), and Array (which is just an
> std::vector). All the types are not used in any special way and just
> changing their declaration should keep compatibility with the rest of
> the code. My code just returns the Value directly, not fancy wrapper
> for pulling things out, but I left that open, would just require a one
> line change of code to wrap it, but figured I might just do open
> functions instead, allowing for a class style wrapper later, that way
> it could be exported in a C style as well.

It seems to me that the parser is always going to be quite closely
coupled with its output type, although some sort of skeleton parser
could be envisaged that would be able to talk to an API for building the
internal JSON representation.

This is one of the reasons I posit mine as a JSON library with the
parser as just one part of it.

I suppose it ought to be possible though to decompose the parser enough
that the components could be used to write to a number of different
internal representations.

> I did not make mine to be a 'real' library though, as stated, just an
> example code, but I did try to make it as accurate to the spec as I
> saw.
>
> As soon as karma is finished for Spirit2x I was planning to make a
> writer for my Value object as well, both as a condensed (efficient)
> printer and a pretty printer.

I have a pretty printer, which I'm not unhappy with, but also think it
would be better in some ways to be able to separate out better the
pretty printing strategy from the structure walking.

> As for comments, in my version it would be simple to change, the
> whitespace skipping parser could easily be extended to catch other
> things, such as comments, which would always be saved out. As stated,
> was just making it as an example of the magic of Spirit2x.

I'm not sure that I can see where the comments would be stored in my
structure so that it made any sense. To have the parser skip them as
whitespace is certainly doable. For that I guess the JavaScript grammar
is the place to look for a specification.

K


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk