Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2022-08-23 21:30:01


On Mon, Aug 22, 2022 at 11:48 PM Andrzej Krzemienski <akrzemi1_at_[hidden]> wrote:
> Like in this example from the docs:
> https://www.boost.org/doc/libs/1_80_0/libs/beast/doc/html/beast/quick_start/http_client.html
>
> http::request<http::string_body> req{http::verb::get, target, version};
> req.set(http::field::host, host);
> req.set(http::field::user_agent, BOOST_BEAST_VERSION_STRING);
>
> At no point do I have to see or provide a full URL.

"target" is the URL there, it came from the command line. It is a relative-ref.

>> * modifiers which take un-encoded inputs have a wide contract: all
>> input strings are valid
>> - however the url might need to reallocate memory to encode the result
>
> Does set_port() taking a string_view fall into that category?
> Unlike other parts, port has special requirements on the string
> contents that cannot be satisfied by pct-encoding.

Yeah, we have a problem here. It sounds like we need to design an
extra set functions. Maybe:

    url_base& url_base::set_port( string_view ); // throws

    result<void> url_base::try_set_port( string_view ); // returns result

what do you think about that?

> If you used result<>, you would lose the ability to chain the setters.

Yeah... well, I think I'm OK with that.

> As an alternative, you could say that function set_port() has a precondition:
> the input string has to represent a number, the caller is responsible for that,
> and set_port() performs no validation, and therefore throws nothing. But I
> guess that would violate one of the design goals of the library: "securely,
> including the case where the inputs come from untrusted sources".

My original thinking was that untrusted sources would only be
presented to parse functions. But our dialog has convinced me that we
should treat the parameters to modification functions as untrusted as
well. True, in some situations we will be performing unavoidable,
needless re-validation. But I think it is the right tradeoff, as
offering the stronger invariant has more value for users. Besides,
there are workarounds for assembling a URL which trade back
performance in exchange for weaker invariants.

> Just to clarify what I mean: https://godbolt.org/z/qY3Y76fd9
>
> urls::string_view s = "https://path?id=42&id=43";
> urls::url r = urls::parse_uri( s ).value();
>
> Should the above input string not cause the invariant to be broken? (because param 'id' appears twice.)

No, because the query is just a string. The interpretation of the
query as "params" (an ampersand delimited list of name=value pairs) is
an HTTP thing (and it has also spread to some other domains). When the
query is used this way, duplicates are allowed. So the invariant is
preserved in this case. It is always a valid URL. It might not be
valid for custom schemes though. I could invent the "boost" scheme
which requires that keys used in query parameters are unique. But
there is no way for Boost.URL to enforce this (see my previous message
regarding "mailto"). Users are provided with the tools to further
specialize the generic URLs into custom schemes in a way that can
preserve scheme-specific invariants.

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk