|
Boost : |
From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2020-09-22 17:55:42
On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert via Boost
<boost_at_[hidden]> wrote:
> Regarding the SAX parser specifically, I dislike the *_part methods in
> the callback interface. I feel that these should be removed and only
> the final form retained, called only when the complete text of the value
> has been determined.
The parser (and serializer) support incremental / streaming operation.
To provide the "complete" text would require allocating memory and
saving the partial result until enough input has been supplied to
finish the text. basic_parser does not currently allocate memory,
which was a design goal, and this would change that.
> I don't feel it is ever sensible to call back with
> a partially-parsed value (unless I am misunderstanding what these are
> for, but there is little explanation in the docs, and I wasn't able to
> get them called at all in my test).
In order to see partial results, you have to invoke the parser with
partial JSON. In other words, break up the input JSON into two or more
consecutive buffers of characters, and supply them in order.
> I also don't see the point of supplying a size argument to on_key* and
> on_string, as this can be read from the string_view argument instead.
Right, so this size argument is actually the total number of
characters, while the string_view::size is the number of characters in
the current piece.
> having a bool return as well as an error_code output argument seems
> peculiar; one or the other should be sufficient, as is it just invites
> inconsistency.
Peculiar, but fast :) this was profiled and optimized. The bool return
plus ec is superior to having only ec.
> parser.write() is incremental but basic_parser.write() is not, which
> seems odd. I assume this means that parser is storing an incrementally
> populated buffer of the full document rather than _really_ incrementally
> parsing.
Everything is incremental. parser::write() has the condition that if
there are any leftover characters after receiving a complete JSON, an
error is returned. Contrast this with parser::write_some which does
not produce an error in this case. basic_parser::write is more similar
to parser::write_some. We will be renaming it to
basic_parser::write_some, as another reviewer suggested, to make this
more clear. There is also a synergy with asio's naming with that
change.
> parse/parser do not appear to support consuming only part of the
> supplied input as a complete JSON value (returning how much was
> consumed), which is useful for streams where multiple objects are
> concatenated without explicit separation.
parse() the free function does not, but parser does - use parser::write_some.
> The "Using Numbers" section of the docs appears to be entirely blank.
This is fixed in develop.
> (On a related note, an example use case for serializer is listed as
> "ensure a fixed amount of work is performed in each cycle", but the only
> means of control is the length of the provided buffer. I guess this is
> equating work to number of characters output, which seems dubious.
Well, yeah, that's exactly what it is. Consider a network server that
delivers JSON to clients. We don't want a completion handler to be
stuck serializing 100MB of JSON while another client is waiting who
only needs 1MB serialized. The way to ensure fairness here is to
serialize incrementally with some reasonable buffer size, say 32kb.
The light client will finish in ~33 i/o cycles while the heavy client
still makes progress.
> It seems less suited for JSON as configuration file -- while it can
> parse such files (via common extensions such as ignoring comments and
> trailing commas) it lacks means to pretty-print and to round-trip the
> comments, which would be essential if writing back a config file
> intended for human use.
To store the comments would mean storing them in the DOM, which would
have a significant impact on performance as the space for everything
is tightly controlled.
> The basic_parser examples do not compile out of the box -- with the
> suggested json.hpp and src.hpp includes there are link errors with
> basic_parser symbols. Additionally including basic_parser.hpp resolves
> these but starts complaining about missing
> max_{key,string,array,object}_size members of the handler type, which
> are not documented. After these are supplied then it works as expected.
> I don't see why these should be needed, however.
Those max sizes should be documented in the handler exemplar:
The reason they are required is for optimization. Handling the limit
in the basic_parser leads to performance gains.
Thanks
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk