Boost logo

Boost :

From: Gavin Lambert (boost_at_[hidden])
Date: 2021-10-18 23:41:57


On 19/10/2021 05:10, Vinnie Falco wrote:
>> - "/.//foo/bar" => { ".", "", "foo", "bar" }
>
> The list looks fine except for the above, which I think has to be {
> "", "foo", "bar" } for the reason that assigning the path should give
> you back the same results when iterated:
>
> url u = parse_uri( "http:" ).value();
>
> u.segments() = { "", "foo", "bar" };
>
> assert( u.encoded_url() == "http:/.//foo/bar" );
> assert( u.segments() == { "", "foo", "bar" } ); // same list

Again, what about the case where the original input URL contained that
leading dot? You can't argue "we must report it unchanged" when by
definition there are conditions when you are changing it.

The only mechanism that seems sane to me is that encoded_url() and
friends are documented to normalise (or at least to partially normalise,
limited to adding/removing the path prefix) the URL before returning a
string, at which point segments() may change content. (But it's
important that it doesn't break if you push_back each segment
individually instead of assigning it all at once.)

> If we then remove the scheme, I think the library needs to remove the
> prefix that it added. Not a full "normalization" (that's a separate
> member function that the user calls explicitly). The rationale is that
> if the library can add something extra to make things consistent, it
> should strive to remove that something extra when it can do so. Yes
> this means that if the user adds the prefix themselves and then
> performs a mutation, the library could end up removing the thing that
> the user added. I think that's ok though, because the (up to 2
> character) path prefixes that the library can add are all semantic
> no-ops even in the filesystem cases.

I don't disagree with this, but I do disagree with the iteration methods
trying to "hide" elements that are actually present in the URL.

On 19/10/2021 08:37, Peter Dimov wrote:
> Segment iteration is not going to be compatible. In addition to
> adding an initial "/" segment for absolute paths, Filesystem also
> collapses consecutive / separators. So iterating "/foo//bar//baz///"
> produces
>
> "/" │ "foo" │ "bar" │ "baz" │ ""
Fair point; I hadn't considered that one. That's unfortunate. I agree
that URL cannot collapse adjacent separators.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk