Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2021-10-19 13:12:12


Peter Dimov wrote:
>> On Sun, Oct 17, 2021 at 8:56 PM Gavin Lambert via Boost
>> > It's worthwhile considering these things from the start, as they can
>> > inform design of your baseline (such as compatibility of path segment
>> > iteration).
>
> Segment iteration is not going to be compatible. In addition to adding
> an initial "/" segment for absolute paths, Filesystem also collapses
> consecutive / separators. So iterating "/foo//bar//baz///" produces
>
> "/" ? "foo" ? "bar" ? "baz" ? ""
>
> (https://godbolt.org/z/EsjKzc5f1)
>
> A design goal of URL seems to be that the information that the accessors
> give accurately reflects the contents of the string (and that there's no
> hidden metadata that the string doesn't reflect.)
>
> So the segments of the above path are
>
> { "foo", "", "bar", "", "baz", "", "", "" }
>
> because otherwise the segments of the above and "/foo/bar/baz/" will
> be the same, which means that it won't be possible to reconstruct the string
> from the information the URL accessors give.

Right. But why has it chosen that goal, rather than the alternatives?
What's the rationale?

It seems to me that a URL with redundant /s (e.g. http://foo.com/path/////to/file)
is either (a) malicious or erroneous input, or (b) equivalent to the
versions without the redundant /s. So a user might want to (a) get an
exception or error, or (b) ignore the redundant segments. Under what
circumstances would a user want to see the empty segments between those
/s?

Here's an alternative:

- Skip over duplicate adjacent '/' when iterating segments.

- Return "/" as the first segment for absolute paths.

- Return "" as the last segment for paths with a trailing "/".

- Give p.push_back(s) a precondition that s must not be empty if
p.back() is empty.

I think this gives pretty sane behaviour. The invariant that push_back()ing
a series of segments and then iterating returns the same strings holds.

Vinnie Falco wrote:
> note that the "absoluteness" of the path is a property of the
> URL which is reflected in the url API and not the segments:

You're saying this because that's what the BNF says. Your URL api
doesn't have to exactly mirror the BNF. If it would make sense for
"absoluteness" to be a property of the path rather than the URL from
the point of view of the library user, you can do that.

Two other things to consider:

- What is your operator== going to do about redundant /s?
- How does this all work with data: and mailto: URLs?

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk