Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2020-09-19 14:11:45


This is my review of Boost.JSON by Vinnie Falco and Krystian Stasiowski.

First, a disclaimer: I have contributed to the library, mostly with design
advice, occasionally to the implementation.

Second, let's start with the end. I think that Boost.JSON should be
ACCEPTED. All of its aspects are of Boost quality, its maintainers are
qualified and responsive, the lack of a JSON library in Boost is frankly
embarrassing, and the json::value type serves as a standard variant for
scripting and transferring language-independent scalar and structured
values, which opens many possibilities for further integration with future
parts of the Boost ecosystem (I'll show an example of that later on.)

To evaluate the library (I was already broadly, but not intimately, familiar
with the structure and the interface), I started with the documentation,
specifically with the Quick Look and Usage sections.

It's very well written, and meets and exceeds the usual Boost level. I only
have a few remarks to make.

In Using Numbers, the documentation should explain more clearly and
prominently that v.as_double() does not succeed when v contains an integer,
but throws an exception. This is a common user expectation.

Instead of focusing on number_cast, It should advertise value_to<double>(v)
as the way to obtain a double from v, converting if necessary; and
similarly, value_to<int>(v) as the way to obtain an int, range-checking if
necessary.

number_cast should be considered legacy at this point, and not featured. I
understand that it offers an error_code& overload for those who don't want
exceptions, but the throwing version is entirely superseded by value_to.

In Parsing, the examples using parse_options look like this:

    parse_options opt;
    opt.allow_comments = true;
    opt.allow_trailing_commas = true;
    opt.allow_invalid_utf8 = true;

    value jv = parse( "[1,2,3,] // comment, extra comma ", storage_ptr(),
opt );

That's fine and is indeed the C++11 standard-compatible way to do things.
However, when designated initializers are available, such as on g++ or
clang++, or on VS2019 with /std:c++latest, the above can be written as

    value jv = parse( "[1,2,3,] // comment, extra comma ", {},
        { .allow_comments = true, .allow_invalid_commas = true,
          .allow_invalid_utf8 = true } );

and it might be useful to show this style too.

In Value Conversion, Converting to Foreign Types, it's shown that defining a
constructor

    customer::customer( value const& jv );

enables value_to<customer>(jv) to work. This is, in my opinion, a legacy
mechanism made obsolete by tag_invoke and should not be supported. This
example should be removed from the documentation, and only the tag_invoke
way should be featured.

And since we're on this page, it's odd that the StringLike requirement wants
construction from (char*, std::size_t) - pointer to non-const characters. I
don't think that any string type has such a constructor, they all take char
const*.

With the documentation carefully read (as in, quickly scanned), I proceeded
to try to use the library. A previous reviewer (Reiner Deyke) had brought up
the subject of CBOR parsing and serialization, and I had looked into CBOR as
a format. It seemed regular and easy to parse and serialize, so I decided to
write a CBOR serializer and parser for boost::json::value. How hard can that
be, really?

Not that hard. The serializer was quickly up and running, and it seemed
quite performant:

Serializing canada.json to CBOR: 1056200 bytes, 4840 us
Serializing canada.json to JSON: 2306988 bytes, 13036 us

(You can find the code at
https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor.cpp,
the serializer is the first 140 or so lines.)

As far as using the library went, it was completely painless and
straightforward.

Next up, the parser. I decided to write it using the json::value interface,
like a reference implementation. This, too, went well, and the parser was
soon parsing, and even producing the same value as the one serialized. The
performance, though...

Parsing canada.json from JSON: 10548 us
Parsing canada.json from CBOR: 33941 us, successful roundtrip

This wasn't good. Switching to a binary format such as CBOR is supposed to
be faster, not 3.4 times slower!

Before you say "well your parser is written like you were a college student
who was taught Java", be aware that I actually refactored the parser a
number of times; sometimes it was written in one style, sometimes in
another, functions went from being hand-inlined to extracted and back a few
times, but none of that made a dent on those 34ms.

To find out where the time was going, I looked into way more detail I needed
to. For instance, I saw that assigning f.ex. 1.0 to a default-constructed
json::value executes much more code than it needs to, and makes non-inlined
calls into libboost_json. One would expect this to take one check and branch
(do we need to deallocate anything? no), and two assignments (kind =
kind_double; dbl_ = 1.0).

Turns out that the efficient way to assign 1.0 is not `v = 1.0;`, but
`v.emplace_double() = 1.0;`. I of course replaced the assignments with this
supposedly efficient form, and it of course made no difference at all.

I then in desperation actually removed all the assignments. And it barely
made any difference.

At this point I decided that the json::value interface is not the right fit
for a performant parser, and switched to using boost::json::value_stack.
This is what Boost.JSON's parser itself uses, so if that's not fast, well, I
don't know what will be.

Updating the parser to use value_stack was basically trivial:

https://github.com/pdimov/boost_json_review/commit/7aa8d2dc8c706c3f4204d6fe70e2adfc49951cab

and it worked. The performance was what I expected:

Parsing canada.json from JSON: 10044 us
Parsing canada.json from CBOR: 6332 us, successful roundtrip

(You can find the value_stack parser at
https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor_st.cpp
and the results at
https://github.com/pdimov/boost_json_review/blob/master/boost_json_cbor_st.md.)

(Since now changes to the parser code actually affected performance, I added
two optimizations to the array parsing - a fast path for arrays containing
only doubles, and a fast path for arrays containing only integers.)

Is value hopeless for parsing, then? I went back to the other parser.

This is how the part that parses objects looks like, with the irrelevant
parts omitted:

boost::json::object & o = v.emplace_object();
o.reserve( n );

for( std::size_t i = 0; i < n; ++i )
{
    // key string
    boost::json::string_view sv( ... );

    // value
    boost::json::value w( v.storage() );
    first = parse_cbor_value( first, last, w );

    o.insert( boost::json::key_value_pair( sv, std::move( w ) ) );
}

The reason I use o.insert here is that I wanted the CBOR parser to match the
behavior of the JSON parser with respect to key order and duplicate keys.
The JSON parser preserves key order, which is both user-friendly and better
from a security perspective (observing key order may leak the seed of the
hash function.)

It also handles duplicates in an unspecified manner. Well, whatever that
unspecified manner was, I wanted to do the same, and using `insert` looked
like the way to do it.

There however was no `insert` taking a string_view and a value, so I had to
create a key_value_pair. There was `insert_or_assign` that did take a
string_view and a value, but it overwrites duplicates, and I didn't want
that.

I consulted Vinnie Falco about the lack of `insert(string_view, value)`, and
he pointed me at `emplace`. This sounded exactly like what my hypothetical
`insert` would do. However, hearing `emplace` gave me an idea. In the same
way `v.emplace_object()` gives me `object&`, `o.emplace(sv)` could have
given me `value&`, which I could then pass directly to `parse_cbor_value`:

    first = parse_cbor_value( first, last, o.emplace(sv) );

I shared this brilliant observation with Vinnie, and he replied "obj[sv]".

obj[sv] indeed. Sometimes we forget that C++98 did have a useful feature or
two.

This doesn't exactly work as I wanted it to, because it neither guarantees
insertion order, nor has the "right" duplicate handling. Nevertheless, I did
make the change:

https://github.com/pdimov/boost_json_review/commit/e6baf8f1ce81966534ac93c50641da87a177aa29

and guess what?

Parsing canada.json from JSON: 10261 us
Parsing canada.json from CBOR: 6761 us, successful roundtrip

So the problem the entire time was that one single line:

    o.insert( boost::json::key_value_pair( sv, std::move( w ) ) );

I haven't looked into it, but I suspect that it for some reason makes a copy
of `w`, even though it's passed using `std::move` and uses the right
`storage_ptr`. Oh well. At least it works now.

So, there is nothing that much wrong with the `value` interface, and it's
possible to build values the straightforward way, instead of using a
value_stack, without significant loss in performance.

These CBOR parsers are entirely functional, and only have the following
limitations:

* Binary strings are not supported (as they're not representable in
json::value);
* Infinite sequences are not supported (because I was too lazy);
* 32 bit platforms need to check the string/array/object sizes for possible
size_t overflow (because I was way too lazy.)

I had also looked at, and tried, the value conversion interface, in the
course of developing the examples of an unreleased library of mine,
Describe. The value_from example is

https://pdimov.github.io/describe/doc/html/describe.html#example_to_json

value_to is at

https://pdimov.github.io/describe/doc/html/describe.html#example_from_json

and a simple JSON-RPC example is at

https://pdimov.github.io/describe/doc/html/describe.html#example_json_rpc

These all worked without any trouble, except I had to link with
Boost.Container for one of them, which didn't seem necessary. They are also
a good illustration of the possibilities that having a standard json::value
type presents, in terms of further library development. (One could f.ex.
imagine a Boost.Script interpreter that uses json::value as its data type,
and can be used to script C++ code.)

In conclusion, the library works, is well documented, and should be
ACCEPTED. I have no acceptance conditions. Obviously, the performance
problem concerning object::insert needs to be addressed, and more generally,
the value interface needs a bit more performance scrutiny because it's not
currently exercised by the benchmarks, but I trust that this will be done.

Solid work, thanks to the authors for their submission, and thank you for
reading this far.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk