Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2022-08-21 14:52:44


On Sun, Aug 21, 2022 at 6:21 AM Phil Endecott via Boost
<boost_at_[hidden]> wrote:
> My review of the proposed Boost.URL library follows.

Thank you for taking the time to have a look at this library.

> In previous email discussions a few months ago I complained about
> the dangerous use of non-owning view types in the library interface,

Yes but... this is how it is. The expectation for this library (and
indeed, all my libraries past, present, and future) is that users have
a grasp of C++ and are aware of the lifetime rules for both
references, and types which have the semantics of references. That's
the price of admission for the type-erasure and zero-cost abstraction.

> result<url_view> r = parse_uri( s );

You could instead write

    result< url > = rv parse_uri( s );

and then rv.value() would be owning.

> auto r = parse_uri( s );
>
> There's no indication here that r is a non-owning view and that
> the caller is responsible for keeping s alive for the lifetime of r.

On this point we agree. But...isn't this always a problem when using
`auto`? It deprives the reader of information. When you write "auto"
you are opting in for convenience over readability. If your complaint
here is that the use of `auto` can make it easy to forget about things
then I agree with that too. Still, I'm glad auto is in the language.
However if you will note, the documentation meticulously minimizes the
usage of auto in the example code, for this reason.

The lifetime requirements are documented:

>From <https://master.url.cpp.al/url/ref/boost__urls__parse_uri.html#url.ref.boost__urls__parse_uri.description>
"Ownership of the string is not transferred; the caller is responsible
for ensuring that the lifetime of the character buffer extends until
the view is no longer being accessed."

> A safer API design would, at a minimum, give the parsing function
> a name with "view" in it. The default, shorter-named function
> should return a value type.

Yes well, we considered that design and concluded that it wasn't
practical. The library has multiple value types for URLs, which one
should it return? And a pattern of use is to store the result of
parsing in an existing variable which already has storage allocated
(to support reuse and avoid unnecessary allocate/copy/free cycles).

> More fundamentally, I'm not convinced that this allocation-avoiding
> view-rich API style is best for a general purpose vocabulary type
> like URL.

A URL is literally a string which meets the prescribed syntactical
requirements of RFC3986. If I receive an HTTP GET request like this:

    GET /index.htm HTTP/1.1\r\n

you think that the library should not let the user parse the substring
and return a view to the URL? And instead it should allocate memory,
all to protect beginners?

> URL should be a "Keep It Simple" type, accessible to
> C++ beginners, with straightforward lifetime semantics.

std::string_view is C++17, are we saying now that Boost libraries
should avoid some C++ features to appease beginners?

> Compare with std::filesystem::path, which does not try
> to optimise by actually being a view.

Not really a good example for bolstering your case, because
filesystem::path has significant problems :)

Significant enough to motivate std::filesystem::path_view:

<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1030r4.pdf>

> Hmm, so url_view doesn't own the underlying storage, except when it
> does as a result of this mysterious method. What's the rationale for
> this? (Have I missed some documentation?)

`boost::urls::url` is for mutation and unique ownership. The function
persist() is provided for efficient shared ownership of read-only
URLs. Users can't write this function themselves, because it relies on
implementation details.

> What is sizeof(url) or sizeof(url_view)? I think it must be about
> 160 bytes, since you store offsets to each of the components. I'm
> not convinced that this is the best approach.
> ...
> Making the objects smaller would probably have
> measurable performance benefits.

This was the choice that we went with, but it can be changed in the
future with sufficient experience in the field, without breaking the
API. We would need measurements to know for sure.

> In your non-view url object, you seem to allocate storage using
> new char[]. Why not use a std::string? One benefit of a std::string
> is SBO. Another is that you could provide a move constructor from
> a string.

No particular reason, this is something we could change in the future
with sufficient experience in the field, without breaking the API.
Move construction from `std::string` does sound interesting :)

I have my doubts that very many URLs are going to be fitting into the
small buffer but you never know.

> In your query parameter API, you expose something that looks like a
> standard associative container except that you call the key and value
> "key" and "value" respectively, rather than "first" and "second".
> While I think many of us would agree that those are better names, it
> does make it more difficult to use this container with generic code
> than it would be otherwise.

This is something we could change, I guess you're proposing:

    using reference_type = `pair< pct_encoded_view, optional<
pct_encoded_view > >`

Which generic code uses first and second? Do you have an example you
might point us at?

> I see that you also have a bool called "has_value" in the value_type.
> I think this is to handle this:
>
> ?a=1 // has_value == true && value == "1"
> ?a= // has_value == true && value == ""
> ?a // has_value == false
>
> Is that correct?

Yes

> Again having this third member in the container's
> value_type makes generic code more difficult. I would be inclined to
> use a std::optional for the value instead.

We are already using boost::optional elsewhere, so we could use it
here as well. But, there is a downside. If the user only cares about
values which are not empty they could skip checking has_value and just
look at key.empty(). If we switch to optional then everyone has to
check optional::has_value(). I'm undecided on which is better,
hopefully we will get more feedback.

> In the past I have had situations where it was necessary to specify
> the case for hex letters in %-encoding; please consider adding this to
> pct_encode_opts.

Well percent-encoded escapes always use hexadecimal so I'm not sure
what you're asking for here.

> The Magnet Link example needs a paragraph at the start to say what a
> Magnet Link is.

Yes good idea.

> Because of these concerns about the basic API design, I believe that
> the library should be REJECTED.

I remember I was at CppCon, I think it was 2018 and I was at some sort
of official dinner - probably the speaker's dinner. I got sat next to
this loudmouth who at one point starting going off on string_view and
how bad it was, how dangerous it was, all the problems it caused in
the jillion-line codebase at his job, and how it never should have
been added to the standard. He did make some valid points of course
but I couldn't shake the feeling that perhaps a programming language
other than C++ would suit him better. That person was Nicolai
Josuttis.

Anyway the point I'm making here is that string_view in particular,
and types that behave like references in general, are a contentious
issue for C++. Some people hate them because "beginners" and
easy-to-make-mistakes. Some people love them because "muh performance"
(and beautiful type-erasure). It seems to me that one of the
distinguishing features of C++ is in fact its zero-cost abstractions.
How cool is it that the Boost.URL library lets you not only parse a
URL in place, which is some part of a larger character buffer, but
also lets you seamlessly access all of the parts? This I believe is
very much in the spirit of C++. If we start thinking that users need
to be coddled, have the useful but sharp edges filed down, pointy ends
blunted, and so forth - I feel like we will end up with Rust. For that
you can probably just save all the work and start using Rust sooner
rather than later.

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk