Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2022-06-04 22:29:27


On Sat, Jun 4, 2022 at 2:23 PM Andrzej Krzemienski via Boost
<boost_at_[hidden]> wrote:
> I am trying to do a review of Boost.URI library

Great! But umm.... err..., well - its called Boost.URL ;)

> I tried to use it with the latest MinGW Distro on Windows (
> https://nuwen.net/mingw.html), which uses GCC 11.2 and Boost 1.77.

Yeah you need the latest Boost. Until the library is actually
accepted, it is written for the tip of the develop branch of the
superproject. Note that this goes for all our in-development
libraries.

> Second, I recommend that Boost.URL docs say that it requires Boost 1.78 or higher.

That's not unreasonable. We develop the documentation as-if the
library is already accepted, to minimize the changes that must be made
post-acceptance. You should open an issue as that is the best way to
motivate change:

<https://github.com/CPPAlliance/url/issues>

> Aliases for standard types, such as string_view
> > <https://master.url.cpp.al/url/ref/boost__urls__string_view.html>, use
> > their Boost equivalents.
> >
> After reading this, I expected that Boost.URL would use boost::string_view
> from Boost.Utility library:
> https://www.boost.org/doc/libs/1_79_0/libs/utility/doc/html/utility/utilities/string_view.html
>
> But instead, it uses boost::core::string_view, which is an implementation
> detail from Boost.Core library:
> https://github.com/CPPAlliance/url/blob/master/include/boost/url/string_view.hpp

Yeah, this documentation was written before we started using Core's
string_view. It will need to be updated in Boost.URL, Boost.JSON,
Boost.Beast, and Boost.HTTP.Proto. Newly opened issues are the best
way to motivate change:

<https://github.com/boostorg/beast/issues>
<https://github.com/boostorg/beast/json>
<https://github.com/CPPAlliance/url/issues>
<https://github.com/CPPAlliance/http_proto>

> Again, this is news for me that Boost has two implementations of
> string_view. Why?

Yeah, so Peter has convinced me that offering two versions of every
one of our libraries is not a great idea. By that I mean, that
offering a macro that lets the user configure the library for either
std::string_view or boost::string_view is detrimental. Because this
produces two distinct linkable libraries that each have their own
diverging ABIs (or is it APIs?). This unnecessary friction is a
constant source of complaints.

Peter's vision is that Boost evolves so that its types are more
compatible with their std equivalents. For example
boost::core::string_view will be more easily converted implicitly in
places where the user expects such conversions to take place. We
couldn't do this in Boost.Utility's string view because the author is
philosophically opposed to making this change. There's some discussion
here:

<https://github.com/boostorg/utility/issues/40>
<https://github.com/boostorg/utility/pull/51>

> Next, the section on the parsers (
> https://master.url.cpp.al/url/parsing/url.html) describes the function
> parse_uri() which returns result<url_view>. What strikes me is this
> difference: URI (Identifier) in the function name, and URL (Locator) in the
> return type. I always used the terms URL and URI interchangeably.

About that. So, the library uses the term "URL" to mean any of the
provided containers, e.g. url_view, url, static_url. The term "URI"
always refers to the specific BNF syntax found in the relevant RFC.

> But now that I see them used in this way in a well designed library, it looks
> disturbing. The quoted rfc3986 (
> https://datatracker.ietf.org/doc/html/rfc3986#section-1.1.3) says that an
> URL is a subset of URI.

The decision that I have made is to just ignore the RFC's guidance on
what URL means, and instead use the term as it has become popularly
known. I believe that the distinction between URL and URI is just not
recognized by the general public and in particular the wide audience
to which Boost.URL applies. No one asks you for your URI, but everyone
asks you for your URL. People put URLs into the address bar. No one
says "type this URI into the address bar." The address bar accepts
non-http schemes such as mailto and file. These are technically URIs
(see: https://en.wikipedia.org/wiki/Mailto). But no one calls them
that.

A google search for "URL" produces fifteen times more results than a
search for "URI" although you would think that URIs would be more
common since they are a superset of URLs. Go figure :) Therefore I
have chosen to use the less technically correct but the more
marketable term "URL" in the key places where it matters: the name of
the library and the name of the container.

Or to put it a different way

    url u;

Looks a hell of a lot better than

    uri u;

> The synopsis for parse_uri (
> https://master.url.cpp.al/url/ref/boost__urls__parse_uri.html) says:
>
> Exception safety: throws nothing.
> >
> And the line below it says that the function throws std::length_error when
> the input is too long. It looks like a bug in specs. Later we read:
>
> Return value: A result containing the view to the URL, or an error code if
> > the parsing was unsuccessful.

Yep this needs an open issue :)

<https://github.com/CPPAlliance/url/issues>

> Which is not precise enough to give me the answer to the URI-vs-URL
> question. When can a parsing be non-successful? Is it only because it was
> not conformant to the grammar? The synopsis says "This function parses a
> string according to the URI grammar below", but is it a URI grammar or a
> URL grammar actually?

Actually this is covered by the docs :) see table 1.1:

<https://master.url.cpp.al/url/parsing/url.html>

> Now, there is probably a good explanation to the URI vs URL discrepancy. I
> think it would be good if it was placed in the docs, so that the users
> don't get confused.

Yes we could use a blurb which explains that the library settles on
the name URL to refer to containers:

<https://github.com/CPPAlliance/url/issues>

> While this might look like a list of complaints, I really appreciate the
> efforts the authors put in creating this library and its documentation. The
> documentation is really high quality, way higher than the average you will
> find in GitHub. And this is actually because of this high quality that I am
> able to spot and report these issues.

Hey thanks!!! Yeah there's of course going to be the usual rogues
gallery of doc mistakes, missing explanations, etc... We appreciate
your investigation of the library and the accompanying reports as they
will help us provide the last bits of polish needed to make this
great!

Regards


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk