Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2021-10-18 00:01:55


On Sun, Oct 17, 2021 at 2:52 PM Gavin Lambert via Boost
<boost_at_[hidden]> wrote:
> > assert( u.encoded_url() == "https:/.//index.htm" );
>
> I assume this was intended to be "https://./index.htm"?

Nope, it was correct as I wrote it. You managed to produce an
authority with a single dot :)

> > abs("/././/", { ".", "", "" });
> >
> > We treat a leading "/." as not appearing in the segments, to make the
> > behavior of the library doing these syntactic adjustments transparent
> > and satisfy the rule that assignments from segments produce the same
> > result when iterated.
>
> If you're stripping leading ./ then shouldn't the result just be "/" alone?
> Same reason that "/../../foo/../bar/" should become "/bar/".

Well no, there's a difference between what is in the value returned by
url_view::encoded_path() and what you get when you iterate the
segments. Leading "/." or "./" stays in the encoded segments but is
not returned by iterating segments, for the reason that it is
considered "metadata" about the path that keeps it regular without
changing the meaning.

I think you are getting confused with "normalization" which is a
different thing entirely. Given the following URL:

    https:/.//index.htm

Normalization would leave it as-is. Given:

    https:/././/index.htm

normalization would return

    https:/.//index.htm

If you start with the URL above and add an authority:

    url u = parse_uri( "https:/.//index.htm" ).value();

    u.set_encoded_authority( "example.com" );

The result is

    assert( u.encoded_url() == "https://example.com//index.htm" );

So there are three things at play here:

    1. Modifying the path for normalization
    2. Tweaking the path to match the grammar
    3. Tweaking the path to provide segments() container invariants

Number 1 above is what people are mostly familiar with, for example
collapsing double dotted segments ".." safely.

Number 2 is understood by fewer people but is a consequence of the
grammar in the RFC. For example, if you have an authority, and a path
that starts with double slash "//", then if you remove the authority
you have to prepend "/." to the path. Another one, if you have a
scheme and a relative path whose first segment contains a colon, and
you remove the scheme then you have to prepend "./" to the path. These
tweaks let the library guarantee that all mutation operations leave
the URL in a syntactically valid state without having to do weird
things like throw exceptions, return error codes, ignore the request,
or worse impose additional semantic changes to the URL (for example,
turning an absolute path into a relative one in a case where the user
didn't explicitly request it).

Number 3 is the most controversial and unintuitive, it falls out as a
consequence of making the segments and encoded_segments containers
behave exactly like vector<string>. For example, if you call clear()
on the container, then it should return an empty list:

    url u = parse_relative_ref( "path/to/file.txt" ).value();

    u.segments().clear();

    assert( u.segments().begin() == u.segments().end() );

However, what if you have an absolute path?

    url u = parse_relative_ref( "/path/to/file.txt" );

    assert( u.segments() == { "path", "to", "file.txt" } );

Okay so far so good but what if you clear?

    u.segments().clear();

    assert( u.encoded_url() == "/" );

Wait, that's not clear, there's still a path segment! Well of course
there is, if you clear an absolute path you should get back an
absolute path. But the segments container should be empty:

    assert( u.segments().begin() == u.segments().end() ); // has to pass

See what's happening here? Lets start with a relative path

    url u = parse_relative_ref( "index.htm" );

Now lets reassign the path:

    u.segments() = { "my:file.txt" };

Well, we can't leave the URL as "my:file.txt" because that would be
interpreted as a scheme. So the library prepends "./":

    assert( u.encoded_segments() == "./my:file.txt" );

But we have an invariant, after you assign the segments you have to
get the same thing back:

    assert( u.segments() = { "my:file.txt" } );

To enforce this invariant we have to treat some path prefixes as if
they weren't there. "/" by itself, "./", and "/.".

And that's how its done

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk