Boost logo

Boost :

From: Gavin Lambert (boost_at_[hidden])
Date: 2020-09-23 01:36:15


On 23/09/2020 12:52, Vinnie Falco wrote:
> It is the opposite of SRP violation. basic_parser is concerned only
> with parsing and not keeping track of the full value. The
> responsibility of tracking full values is moved to the caller. That
> way, basic_parser does not impose its opinions on how the full value
> should be stored. This was litigated in earlier posts if I recall
> (Peter?).

Except it does, because it parses into int/uint/double, which is
inherently a storage decision. You might have a valid argument if it
left that step to the handler to determine as well. (i.e. only
providing on_number_part and on_number, just the text without any
attempt to parse into a numeric type.)

Making that change would probably make the arbitrary-precision-parsing
people happier as well.

> The interfaces are built up in layers. At the lowest layer you have
> basic_parser which concerns itself only with processing the input
> buffer and turning it into semantic components. It does _not_ have the
> responsibility of buffering input to present it as a larger component,
> that's the job of the handler. basic_parser is exposed to users so
> they can use their own schemes for dealing with the semantic
> components.
>
> Above basic_parser, you have parser which uses value_stack. The
> value_stack does take responsibility for buffering input, and
> producing DOM values. As with basic_parser, both of these components
> are exposed to users so they can write their own layers on top.

My point is that this seems like the wrong choice of layers. As it
stands, consumers should never care about the basic_parser layer, and
instead will actually want a different layer (not currently exposed)
that is between the two, which exposes complete values that are not in
any DOM.

The current form of basic_parser could still exist, if you like, but it
should be made an implementation detail and the complete-value
basic_parser actually exposed to consumers.

>> parser.write(&data1[0], data1.size(), ec);
>> parser.write(&data2[0], data2.size(), ec);
>> parser.write(&data3[0], data3.size(), ec);
>
> You have to call basic_parser::reset in order to start parsing a new "document."

This is not parsing new documents, but new document fragments. As such
it must not reset.

> No it is not a poor design choice, it is the correct design choice. If
> basic_parser were to buffer the input, then users who have no need of
> buffered input (such as value_stack) would be paying for it for
> nothing. This interface is driven first and foremost for performance
> and the incremental/streaming functionality. If callers want to see
> full text, they can append the partial strings into a data member
> until the final on_string or on_int64 / on_uint64 / on_double and then
> interact with it that way. But they are not obliged to do so.

There should never be any such consumers. Partial values are not useful
to anyone. If they exist in the library, that is because things are
structured improperly.

> If we were to follow your "correct design" to make basic_parser buffer
> the input, the implementation would be needlessly allocating memory
> and copying characters.

It has to happen somewhere, regardless. Why are you pushing that
reponsibility onto the consumer?

> And I think this is the right choice. The library makes common things
> easy, e.g. call parse(). If you need more complexity, you instantiate
> a `parser` whose API requires a few more steps. If you really want to
> go low level, you use `basic_parser` which being low level can seem
> esoteric.
>
> And I think this is okay, because the percentage of users who will use
> basic_parser is incredibly tiny. The percentage of `parser` users much
> greater. While the percentage of users of the parse() free function
> will be closer to 100% than anything else. In other words the library
> favors performance and utility at the lowest levels over theoretical
> interface correctness, where the number of users is naturally smaller
> (and their domain-expertise naturally higher).

So why provide basic_parser at all, if its interface is deliberately
incorrect?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk