Boost logo

Boost :

From: Krystian Stasiowski (sdkrystian_at_[hidden])
Date: 2020-09-23 03:45:54


On Tue, Sep 22, 2020 at 7:48 PM Gavin Lambert via Boost <
boost_at_[hidden]> wrote:

> On 23/09/2020 05:55, Vinnie Falco wrote:
> >> I also don't see the point of supplying a size argument to on_key* and
> >> on_string, as this can be read from the string_view argument instead.
> >
> > Right, so this size argument is actually the total number of
> > characters, while the string_view::size is the number of characters in
> > the current piece.
>
> Which is a poor design choice, as I said.
>

How? The goal is to provide as much information as possible to the handler.
Forcing users to track it themselves would introduce redundancy.

> I don't really understand why. basic_parser doesn't actually store any
> > of the keys/values (especially as it doesn't merge value parts), so it
> > should have no reason to impose any kind of limit. This just seems like
> > it's doing an extra count/comparison that could be removed entirely, and
> > removing it would naturally improve performance.
>

basic_parser only checks the limit; it does not enforce it. The handler is
responsible for determining what that limit is. I suppose that we could add
a little TMP that elides the check statically, but I don't see too much of
a point in doing so.

On Tue, Sep 22, 2020 at 7:48 PM Gavin Lambert via Boost <
boost_at_[hidden]> wrote:

> On 23/09/2020 05:55, Vinnie Falco wrote:
> > On Tue, Sep 22, 2020 at 1:13 AM Gavin Lambert wrote:
> >> Regarding the SAX parser specifically, I dislike the *_part methods in
> >> the callback interface. I feel that these should be removed and only
> >> the final form retained, called only when the complete text of the value
> >> has been determined.
> >
> > The parser (and serializer) support incremental / streaming operation.
> > To provide the "complete" text would require allocating memory and
> > saving the partial result until enough input has been supplied to
> > finish the text. basic_parser does not currently allocate memory,
> > which was a design goal, and this would change that.
>
> Not necessarily. A small buffer can be statically allocated for a
> single value-in-progress. It's trivial for all numeric values (except
> perhaps pathological precision, but that can be ignored). It's harder
> for string values; that might have to support dynamic allocation, it's
> true.
>
> I just don't see the point in having an interface that can return half
> of a value -- this just moves the problem of allocation and keeping
> track of the full value to the parser handler, which seems like an SRP
> violation.
>
> >> I don't feel it is ever sensible to call back with
> >> a partially-parsed value (unless I am misunderstanding what these are
> >> for, but there is little explanation in the docs, and I wasn't able to
> >> get them called at all in my test).
> >
> > In order to see partial results, you have to invoke the parser with
> > partial JSON. In other words, break up the input JSON into two or more
> > consecutive buffers of characters, and supply them in order.
>
> I tried that, but the parse simply failed at the end of the first
> partial buffer and it completely ignored the second buffer. Hence my
> statement that basic_parser did not operate incrementally. Though see
> below; this appears to be a failure of example rather than of
> implementation.
>
> >> I also don't see the point of supplying a size argument to on_key* and
> >> on_string, as this can be read from the string_view argument instead.
> >
> > Right, so this size argument is actually the total number of
> > characters, while the string_view::size is the number of characters in
> > the current piece.
>
> Which is a poor design choice, as I said.
>
> >> parser.write() is incremental but basic_parser.write() is not, which
> >> seems odd. I assume this means that parser is storing an incrementally
> >> populated buffer of the full document rather than _really_ incrementally
> >> parsing.
> >
> > Everything is incremental. parser::write() has the condition that if
> > there are any leftover characters after receiving a complete JSON, an
> > error is returned. Contrast this with parser::write_some which does
> > not produce an error in this case. basic_parser::write is more similar
> > to parser::write_some. We will be renaming it to
> > basic_parser::write_some, as another reviewer suggested, to make this
> > more clear. There is also a synergy with asio's naming with that
> > change.
>
> I wrote a print_parser (which is just like the null_parser example but
> prints the method calls and parameters) and invoked it like so:
>
> json::error_code ec;
> print_parser parser;
> std::string data1 = R"j({"foo":14,"bar":1)j";
> std::string data2 = R"j(8,"baz":"bo)j";
> std::string data3 = R"j(b"})j";
> parser.write(&data1[0], data1.size(), ec);
> parser.write(&data2[0], data2.size(), ec);
> parser.write(&data3[0], data3.size(), ec);
>
> It stopped after calling on_int64(1, "1") and did not continue parsing,
> completely ignoring the second buffer, never calling any _part methods
> and reporting the wrong integer value. It parses correctly only if both
> are supplied. Hence, as I said, it is not behaving incrementally.
>
> However I have since noticed that the missing link is that it must pass
> 'true' as the 'more' parameter to basic_parser.write, which is not
> demonstrated in any example; it may be helpful to show that. (In fact
> this resolves my other concern about multi-document parsing as well,
> which also could use an explicit example.)
>
> Once this is done, then (as expected by the docs) it reports these calls
> (in part):
>
> on_number_part("1")
> on_int64(18, "8")
>
> on_string_part("bo", 2)
> on_string("b", 3)
>
> So I reiterate that the number and string handling is inconsistent and
> these part methods are more confusing than helpful -- numbers are parsed
> into the complete value but strings are not, and neither give you the
> full text of the value (which may be important if the goal is to parse
> into an arbitrary-precision value type).
>
> I do regard this as a very poor design choice, but not sufficiently so
> to reject the library.
>
> (I still also regard SAX-style parsing itself as a poor design choice
> [vs pull parsing], due to poor composability, but that is not a failure
> of the library itself, except perhaps in offering it at all. But I can
> understand why.)
>
> > Well, yeah, that's exactly what it is. Consider a network server that
> > delivers JSON to clients. We don't want a completion handler to be
> > stuck serializing 100MB of JSON while another client is waiting who
> > only needs 1MB serialized. The way to ensure fairness here is to
> > serialize incrementally with some reasonable buffer size, say 32kb.
> > The light client will finish in ~33 i/o cycles while the heavy client
> > still makes progress.
>
> I was assuming that a "deep" JSON (many nested objects) may have a
> different cost per output character than a wide but shallow one. But
> it's not a big deal.
>
> > To store the comments would mean storing them in the DOM, which would
> > have a significant impact on performance as the space for everything
> > is tightly controlled.
>
> I don't see why. It's just an extra string associated with any value.
>
> FWIW, I don't think it needs to *precisely* round-trip the comments --
> it's ok to convert comments to /**/ style (that's mandatory if the
> output isn't pretty-printed) and to consolidate all comments associated
> with one value to a single comment that's always output after that value
> (or perhaps have 'before' and 'after' slots). This should be much more
> manageable.
>
> As it stands, the library is completely useless for the case of writing
> a JSON config file intended for human use. It's ok if you don't want
> that to be a goal, but this should be made clear up front so that people
> don't waste time considering the library for use cases it's not intended
> for. (Though it would be preferable if you did support it.)
>
> >> The basic_parser examples do not compile out of the box -- with the
> >> suggested json.hpp and src.hpp includes there are link errors with
> >> basic_parser symbols. Additionally including basic_parser.hpp resolves
> >> these but starts complaining about missing
> >> max_{key,string,array,object}_size members of the handler type, which
> >> are not documented. After these are supplied then it works as expected.
> >> I don't see why these should be needed, however.
> >
> > Those max sizes should be documented in the handler exemplar:
> >
> > <
> https://master.json.cpp.al/json/ref/boost__json__basic_parser.html#json.ref.boost__json__basic_parser.handler0
> >
>
> The docs I am referring to are the ones referenced in the original
> pre-announcement post (although now I notice that the main announcement
> post has a different link):
>
>
>
> http://vinniefalco.github.io/doc/json/json/ref/boost__json__basic_parser.html
>
> This is also where the "using numbers" section is entirely blank.
>
> > The reason they are required is for optimization. Handling the limit
> > in the basic_parser leads to performance gains.
>
> I don't really understand why. basic_parser doesn't actually store any
> of the keys/values (especially as it doesn't merge value parts), so it
> should have no reason to impose any kind of limit. This just seems like
> it's doing an extra count/comparison that could be removed entirely, and
> removing it would naturally improve performance.
>
> Unless you're saying that moving the check to basic_parser improves the
> performance of parser, in which case: you're doing it wrong. Imposing
> restrictions on all users of basic_parser for something that only parser
> cares about is the wrong design choice.
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk