Boost logo

Boost :

From: Gavin Lambert (boost_at_[hidden])
Date: 2022-08-24 23:44:03


On 24/08/2022 20:03, Andrzej Krzemienski wrote:
> śr., 24 sie 2022 o 01:37 Vinnie Falco napisał(a):
>> Because this library is capable of representing ALL URLs, it is
>> necessary for the interface to allow the caller to interact with the
>> port as a string which is valid according to the grammar.

I don't really think that's a good reason. The grammar specifically
states that the value must be numeric, and while it doesn't explicitly
place a limit on its range, it's not really reasonable to use it for
anything other than a TCP/UDP port number, which *is* explicitly limited
to 16-bit values. While URLs do occasionally get used for non-Internet
protocols, I can't think of a case where a larger port number would be used.

As such, I do think it's reasonable for it to fail parsing if someone
tries to use an out-of-range port number, at least until someone
demonstrates a (non-arbitrary) case where a larger range is needed.

Regarding 0 being potentially ambiguous, I don't think that's an issue
either, since it's not a valid TCP/UDP port number. And you do have
.has_port() if someone really wants to disambiguate that case. You
could potentially change it to optional<uint16_t> if you really want,
but I don't think that's needed or helpful.

> Is this the only reason? I would think that the primary reason is
> performance: usually you obtain a URL as a string, so the port-string is
> just sitting there. Then you may want to set it in another URL, so it will
> be a string again.

This is perhaps a better reason, and there are some other network
libraries that do take the port as a string (or as a full authority
string including hostname and colon). And if you're just disassembling,
modifying, and reassembling the URL to pass on as a string then you may
have no need of converting to integer. But then, in those cases you'd
never call .port() anyway.

So I don't think it would cost anything to remove the string accessors
from the public API, even if it keeps the internal string representation
and only parses to int on-demand. And it would avoid some complications
with .set_port(s) and invalid input.

Or perhaps make .port() and .set_port(n) use uint16_t but retain a
string getter (but not setter) .port_string() for the corner cases.

> 2. The query part doesn't have duplicate keys.

I definitely disagree with this one. Repeated keys in the query are
quite commonplace to represent array values or other multi-selects.

Having said that, it may be useful to present (both for read and write)
the params as a multi-map-like structure with unique keys and multiple
values. This can't be the primary interface (since it can only be used
when the order of the keys is not important, which is commonly the case
but not *always* the case) but it would be a more convenient one for
most usage.

Aside: at the bottom of
https://master.url.cpp.al/url/containers/query.html the examples 3 & 4
are both labelled "a key with no value", which is not what #4 shows.
There is also no example for "no key and no value", which was stated to
be a possible result in the preceding paragraphs. The example query
string suggests that perhaps that should have appeared in between 3 and 4.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk