Boost logo

Boost :

From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2022-08-23 15:44:11


On Mon, Aug 22, 2022 at 10:40 AM Vinnie Falco <vinnie.falco_at_[hidden]> wrote:
>
> On Mon, Aug 22, 2022 at 7:47 AM Zach Laine <whatwasthataddress_at_[hidden]> wrote:
> > > > But this leaves us with a problem - how do you modify individual path
> > > > segments and query params?
> > >
> > > Easy -- you just copy them into a flat_map, vector, or whatever.
> > > ...
> > > Parsing -> lazy_containers -> user mutations outside of Boost.URL ->
> > > writing URLs
>
> It sounds like what you are proposing, is that Boost.URL should remove
> its algorithms for performing modifications to the URL. Or at least,
> remove the ability to modify the path segments and query parameters in
> terms of their equivalent sequences (BidirectionalRange).
>
> The modifiable containers in Boost.URL maintain an important
> invariant: modifications to the container will always leave the URL in
> a valid state. The implication is that the stored string is always in
> its "serialized" form.
>
> > > That last step only needs to understand the different kinds of data it
> > > might write, abstractly. That abstraction might be a params_view, a
> > > flat_map, or a std::vector<std::pair<strd::string, std::string>>.
>
> This isn't workable at all, for several reasons. First of all when the
> user passes an encoded parameter to a modification function (named
> with the suffix "_encoded") the library performs validation on the
> input to preserve the invariant. Second, the library remembers the
> decoded size so that it doesn't have to re-parse that portion of the
> URL again. Third, when the user passes a decoded parameter to a
> modification function the library applies whatever percent-escapes are
> necessary to make the string valid for the part being changed, and
> stores that. And finally, there are certain modifications to the URL
> which require that the library adjusts the result in order to both
> satisfy the user's request and preserve the invariant that all stored
> URLs are valid. For example, consider these statements:
>
> url u( "ldap:local:userdb/8675309" );
> u.remove_scheme();
>
> What should the resulting URL be? Well, it can't be
> "local:userdb/8675309" because that still has a scheme. The library
> has to produce:
>
> u == "local%3a/8675309";
>
> In other words we go from an absolute-URI to a relative-ref. In
> addition to preserving the invariant that all modifications produce
> valid URLs, the library also ensures that it can satisfy every
> possible user-requested modification (presuming that no
> incorrectly-encoded strings are passed to functions which accept
> encoded parameters). There is a fair bit of cleverness going on behind
> the scenes to bring this convenience to the user. That would all be
> lost if we outsourced mutation operations to std containers.

Ok, I'm convinced. I am still not convinced that the containers that
maintain these invariants should be lazy. That still seems weird to
me. If they own the data, and are regular types, they should probably
be eager. The larger issue to me is that they have a subset of the
expected STL API.

> > > The diagnostics go to some optional output channel, which might be a
> > > stream or file or logger. If you don't set the output, the diagnostic
> > > is never generated. You set this output in the call to parse*(), as
> > > an optional param -- usually a std::function<void(std::string const
> > > &)>.
>
> huh...idk about all that. This library was designed for program to
> program communication and not really user-input and user display.
> Although it could be used for that. If someone wants to print the URL
> and show a squiggly line near the part that failed, they can always do
> that by calling the parse function that uses an iterator as an
> out-param, indicating where the error occurred. I don't know that I
> want the URL library to go out of its way to offer this functionality,
> especially adding a `std::function` to an API.

That's kind of my point. You have a parsing minilib that is useful
for URL parsing, but not *general use*. If that's the case, I think
you should present it as that, and not a general use parsing lib.

(The use of std::function is incidental; it could be a template
parameter instead.)

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk