Boost logo

Boost :

From: Andrzej Krzemienski (akrzemi1_at_[hidden])
Date: 2020-09-23 10:31:00


śr., 23 wrz 2020 o 02:53 Vinnie Falco via Boost <boost_at_[hidden]>
napisał(a):

> On Tue, Sep 22, 2020 at 4:47 PM Gavin Lambert via Boost
> <boost_at_[hidden]> wrote:
> > I just don't see the point in having an interface that can return half
> > of a value -- this just moves the problem of allocation and keeping
> > track of the full value to the parser handler, which seems like an SRP
> > violation.
>
> It is the opposite of SRP violation. basic_parser is concerned only
> with parsing and not keeping track of the full value. The
> responsibility of tracking full values is moved to the caller. That
> way, basic_parser does not impose its opinions on how the full value
> should be stored. This was litigated in earlier posts if I recall
> (Peter?).
>
> > >> I don't feel it is ever sensible to call back with
> > >> a partially-parsed value (unless I am misunderstanding what these are
> > >> for, but there is little explanation in the docs, and I wasn't able to
> > >> get them called at all in my test).
>
> The interfaces are built up in layers. At the lowest layer you have
> basic_parser which concerns itself only with processing the input
> buffer and turning it into semantic components. It does _not_ have the
> responsibility of buffering input to present it as a larger component,
> that's the job of the handler. basic_parser is exposed to users so
> they can use their own schemes for dealing with the semantic
> components.
>
> Above basic_parser, you have parser which uses value_stack. The
> value_stack does take responsibility for buffering input, and
> producing DOM values. As with basic_parser, both of these components
> are exposed to users so they can write their own layers on top.
>
> On top of parser we have the parse() free functions, which address the
> most popular use-case of parsing. These functions will satisfy most
> needs. And for the needs that are not satisfied, users can easily
> implement their own parse functions with different behaviors, as the
> library exposes all the intermediate lower levels.
>

Maybe this explanation is worth putting in the documentation.

Regards,
&rzej;

> > > Right, so this size argument is actually the total number of
> > > characters, while the string_view::size is the number of characters in
> > > the current piece.
> >
> > Which is a poor design choice, as I said.
>
> No, the information is already there so we are just passing it to the
> handler which can use it, or not. If the handler doesn't use it, then
> it is optimized away.
>
> > parser.write(&data1[0], data1.size(), ec);
> > parser.write(&data2[0], data2.size(), ec);
> > parser.write(&data3[0], data3.size(), ec);
>
> You have to call basic_parser::reset in order to start parsing a new
> "document."
>
> > So I reiterate that the number and string handling is inconsistent and
> > these part methods are more confusing than helpful -- numbers are parsed
> > into the complete value but strings are not, and neither give you the
> > full text of the value (which may be important if the goal is to parse
> > into an arbitrary-precision value type).
> >
> > I do regard this as a very poor design choice, but not sufficiently so
> > to reject the library.
>
> No it is not a poor design choice, it is the correct design choice. If
> basic_parser were to buffer the input, then users who have no need of
> buffered input (such as value_stack) would be paying for it for
> nothing. This interface is driven first and foremost for performance
> and the incremental/streaming functionality. If callers want to see
> full text, they can append the partial strings into a data member
> until the final on_string or on_int64 / on_uint64 / on_double and then
> interact with it that way. But they are not obliged to do so.
>
> Again I repeat, this is not a poor design choice. json::string for
> example is perfectly capable of being constructed from two separate
> buffers of characters (an implementation detail):
>
> <
> https://github.com/CPPAlliance/json/blob/6ddddfb16f9b54407d153abdda11a29f74642854/include/boost/json/impl/value_stack.ipp#L428
> >
>
> If we were to follow your "correct design" to make basic_parser buffer
> the input, the implementation would be needlessly allocating memory
> and copying characters.
>
> > I was assuming that a "deep" JSON (many nested objects) may have a
> > different cost per output character than a wide but shallow one. But
> > it's not a big deal.
>
> Yes, it isn't perfect of course but over time and with a reasonable
> distribution of possible JSON inputs it averages out to be good
> enough. Certainly better than not having streaming at all (which would
> be terrible).
>
> > > To store the comments would mean storing them in the DOM, which would
> > > have a significant impact on performance as the space for everything
> > > is tightly controlled.
> >
> > I don't see why. It's just an extra string associated with any value.
>
> Yes exactly. sizeof(value)==16 on 32bit platforms, and
> sizeof(value)==24 on 64-bit platforms. When arrays and objects are
> constructed during parsing, the implementation performs a memcpy after
> allocation. Every additional byte in the sizeof(value) slows down the
> creation of arrays and objects. Adding a pointer to value would cost
> 25% on 32-bit and 33% on 64-bit, and make Boost.JSON unable to ever
> beat RapidJSON in performance.
>
> > This just seems like
> > it's doing an extra count/comparison that could be removed entirely, and
> > removing it would naturally improve performance.
>
> If you don't want a limit then just set the max to `size_t(-1)` and
> the compiler will elide the check completely.
>
> > Unless you're saying that moving the check to basic_parser improves the
> > performance of parser, in which case: you're doing it wrong. Imposing
> > restrictions on all users of basic_parser for something that only parser
> > cares about is the wrong design choice.
>
> The design of these components balances usability with performance. If
> you don't care about limits you can just set the max to size_t(-1).
> And I think this is the right choice. The library makes common things
> easy, e.g. call parse(). If you need more complexity, you instantiate
> a `parser` whose API requires a few more steps. If you really want to
> go low level, you use `basic_parser` which being low level can seem
> esoteric.
>
> And I think this is okay, because the percentage of users who will use
> basic_parser is incredibly tiny. The percentage of `parser` users much
> greater. While the percentage of users of the parse() free function
> will be closer to 100% than anything else. In other words the library
> favors performance and utility at the lowest levels over theoretical
> interface correctness, where the number of users is naturally smaller
> (and their domain-expertise naturally higher).
>
> Regards
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk