Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2022-08-25 01:01:21


On Wed, Aug 24, 2022 at 4:44 PM Gavin Lambert via Boost
<boost_at_[hidden]> wrote:
> > śr., 24 sie 2022 o 01:37 Vinnie Falco napisał(a):
> >> Because this library is capable of representing ALL URLs, it is
> >> necessary for the interface to allow the caller to interact with the
> >> port as a string which is valid according to the grammar.
>
> While URLs do occasionally get used for non-Internet
> protocols, I can't think of a case where a larger port number would be used.

I can't either, but when building this library we have done our best
to refrain from subjective interpretations of the grammar, which
states unambiguously:

    port = *DIGIT

This means zero or more digits. TCP/IP was already quite well
established for decades before rfc3986 was written, so I have to think
that if they wanted the port to be limited to only what is possible
with TCP/IP they would have stated so explicitly. Port zero for
example, is an invalid port number, but it is allowed by this grammar.

Some of the grammars in the spec are explicit when it comes to numeric
limits, for example a dec-octet is limited to 0-255:

      dec-octet = DIGIT ; 0-9
                  / %x31-39 DIGIT ; 10-99
                  / "1" 2DIGIT ; 100-199
                  / "2" %x30-34 DIGIT ; 200-249
                  / "25" %x30-35 ; 250-255

had the authors intended to restrict the port they would have written
it this way. I have been reluctant to put my own spin on interpreting
the RFC if for no other reason, that I do not have sufficient field
experience with the countless number of published and unpublished
schemes which are currently in use. A conservative design choice
follows the specification to the letter.

> As such, I do think it's reasonable for it to fail parsing if someone
> tries to use an out-of-range port number

This is weird because you're saying that we should not accept valid
productions according to the grammar?

> So I don't think it would cost anything to remove the string accessors
> from the public API, even if it keeps the internal string representation
> and only parses to int on-demand. And it would avoid some complications
> with .set_port(s) and invalid input.

There's a problem here. What if we get

    url_view u( "//example.com:00" );

Is this a valid URL? According to the RFC it is. we parse it, and
return 0 from port_number(). But you want to take away the port string
modifiers. One of the principles of this library, is that the API
allows the user to create any possible valid URL. That is, given a
valid URL string, there exists a finite sequence of calls to the
library that will produce the string. Taking away the port string
modifiers leaves the library in the questionable position that users
cannot create a URL that the parser thinks is valid.

Regards


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk