Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2021-10-13 15:14:03


On Tue, Oct 12, 2021 at 11:46 PM Gavin Lambert via Boost
<boost_at_[hidden]> wrote:
> it is not possible to take an arbitrary URL and round-trip it
> between encoded and decoded forms as a whole (even ignoring equivalent
> variants like %20 vs + or %-encoding more characters than strictly
> required).

That's right.

> Where a / occurs inside a single path-component, it must be escaped so
> as to not be seen as a path-component-separator. And as such, it's not
> possible to (reliably) pass multiple unencoded path-components as a
> single string.

Not exactly. If you call set_path() it treats slash as a separator,
since it works with the entire path portion of the URL. If you call
segments().push_back( "my/slash" ) then you get the slash
percent-encoded. The library APIs are biased towards interpreting the
path hierarchically but you can still treat the path as a monolithic
string using set_path and set_encoded_path.

> The same problems occur with query parameters and &= characters, or with
> ?# characters appearing anywhere.

There isn't any actual ambiguity here. set_query() treats '&' and '='
as separators. If that is not your intent you can use
params().insert() or params().push_back() which will percent-encode
these symbols. If you call set_query() with a literal string you will
get that literal string as a percent-encoded query in the final URL.
If you then call query() you will get back the original unencoded
string. Modulo the wrinkle where "+" becomes a space on decoding, but
even that can be controlled by the user:

<https://master.url.cpp.al/url/ref/boost__urls__pct_decode_opts.html>

> (Where the authority component is not a DNS hostname there may be issues
> with :@/ characters appearing there too.)

Well, there is no set_authority() function yet, only
set_encoded_authority(). I haven't looked closely at it but it would
similarly to the others. However, I'm not convinced it is necessary.
If you call set_host() you can put any string you want in there and it
will be correctly percent-encoded. If there is no user, password, or
port, then that percent-encoded string is effectively the entire
"authority" - so there is already a way to set the authority to an
arbitrary string.

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk