Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2019-09-23 15:54:35


On Mon, Sep 23, 2019 at 8:17 AM Phil Endecott via Boost
<boost_at_[hidden]> wrote:
> but maybe there are common bits of functionality that
> can be shared?

Perhaps, but my preference is for monolithic parsers with no external
dependencies and little to no configurability. They are easier to
audit (and cheaper), and I do plan to commission a security review for
Boost.JSON as I have done with Beast:

<https://vinniefalco.github.io/BeastAssets/Beast%20-%20Hybrid%20Application%20Assessment%202017%20-%20Assessment%20Report%20-%2020171114.pdf>

It is also easier to maintain, and less likely to require changes
(which bring the risk of new vulnerabilities) if the scope of
functionality is strictly limited. It is true that this results in a
parser which is less flexible. A survey of parsers in existing JSON
libraries shows great diversity, so there is no shortage of
flexibility there. I think there is room for one more strict parser
with a static set of features.

> My preference has always been for parsing by memory-mapping the entire
> file, or equivalently reading the entire document into memory as a blob
> of text

While parsing is important, as with the HTTP it is the least
interesting aspect of JSON since parsing happens only once but
inspection and modification of a JSON document (the
`boost::json::value` type) happens continually, including across
library API boundaries where JSON value types appear in function
signatures or data members.

> ...be clear about whether performance is a design goal.

Performance is a design goal, but it is with respect to performance in
the larger context of a network application. This library is less
concerned about parsing a large chunk of in-memory serialized JSON
over and over again inside a tight loop to hit a meaningless number in
a contrived benchmark, and more concerned about ensuring that network
programs have control over how and when memory allocation takes place,
latency, and resource fairness when handling a large number of
connections. That is why the library is built from the ground up to
support allocators, to support incremental operation for parsing and
serialization (bounded work in each I/O cycle reduces latency and
increases fairness),

Since the parser is presented with one or more buffers of memory
containing the JSON document, and there is an API to inform the parser
when these memory buffers represent the complete document, it should
be possible to apply most of the optimizations currently used in other
libraries, including SIMD algorithms when the complete document is
presented. That said, if the experience with HTTP in Beast is
representative of network applications which use JSON (a reasonable
assumption), relatively little time is spent parsing a JSON RPC
command coming from a connection compared to the time required to
process the command, so the gains to be had from an optimized parser
may not be so impressive. I will still eventually apply optimizations
to it of course, for bragging rights. But I am in no hurry.

Regards


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk