Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2022-08-22 19:41:18


On Mon, Aug 22, 2022 at 11:34 AM Andrzej Krzemienski via Boost
<boost_at_[hidden]> wrote:
> I use URLs indirectly by using Boost.Beast, but Boost.Beast
> offers a higher-level interface for providing parts of URL

huh.. that's news to me :) Beast only gives you the string in the
request message:

https://github.com/boostorg/beast/blob/76043dec2cf67a2ba33b32bdcc129f5f0027b8be/include/boost/beast/http/message.hpp#L173

it is this string which users of both Beast and URL would want to parse, e.g.

    void handle_request( beast::request const& req )
    {
       urls::url_view u( req.target() );
       ...

Just out of curiosity where do you see this higher level interface for
providing URL parts that Beast provides?

> There is one thing that I do not understand though. I read that:
> 1. The library is tailored for environments that do not use exceptions
> 2. url guarantees that it maintains its invariant of always storing a
> string that represents a valid url.

Yeah, I think that we (the library authors) and the documentation are
not communicating things correctly. So I will try to make some
statements that are definitely correct, and then perhaps we can sort
out what it means for exceptionless environments:

* Anything which allocates memory will throw on heap exhaustion (class
url mainly)
  - redefine BOOST_THROW_EXCEPTION to call std::terminate or something

* URL and authority parsing free functions and grammar rules do not
throw (they use error_code)
  - they return views, which do not allocate

* url_view, url, and static_url constructors will throw on syntax error
  - simply avoid these constructors and use the free functions instead

* modifiers which take un-encoded inputs have a wide contract: all
input strings are valid
  - however the url might need to reallocate memory to encode the result

* modifiers which take encoded inputs have a precondition: the input
is a valid encoding for the part
  - you can call the non-encoded function which will do the encoding for you

* serialization of a url, url_view, or static_url throws nothing
  - because the string is always stored serialized

* static_url avoids all allocation
  - but it will throw on overflow, and the caller is expected to size the
    variable appropriately, and prevent passing large inputs

My thinking originally was that parsing into a view, storing a
serialized string, and offering the static_url container could be
meaningfully called "friendly for environments without exceptions." We
should probably reframe this :)

> So given the following code executed on a platform with exceptions
> disabled, what happens?
>
> void test(url my_url)
> {
> my_url.set_port("twenty-one");
> }

Well the invariant that the container always holds a valid URL will no
longer be true.

> if I were using urls in no-exceptions environment, I think I
> would expect an interface like:
>
> error_code ec;
> my_url.set_port("twenty-one", ec);

Yes this is something we might formalize into the API, although we
would use `result<void>` as the return type. But I'm not sure how
helpful that is. This needs some thought.

> if there is N types
> storing the same logical data in N different ways, and I have a function
> returning this logical data, I do not want to offer N functions with N
> names. I may add a template, or I may use a universal type that can cheaply
> convert to the N types. How do I do that if I do not want to put templates
> in the interface of my library?

Ah yeah that's actually quite easy. `url_view_base const&` will
capture the read-only part of any URL, while `url_base&` will capture
both the read-only and the modifiable part of any URL. No templates
needed. You inspired this change when you reported the problem with
slicing. Example code:

    // can be called with url_view, url, or static_url
    void f( boost::urls::url_view_base const& );

    // can be called with url or static_url
    void f( boost::urls::url_base& );

We should probably document this usage.

> Next, classes url_base and url_view_base are part of the library's public
> API. Why? You can keep them as bases but not advertise them in the library
> documentation. Do you expect any use cases for them?

They are advertised in the reference:

<https://master.url.cpp.al/url/ref/boost__urls__url_view_base.html>

> Next. the design in url classes is that it is a mixture of the generic URL
> as defined in the RFC, and the HTTP-speciffic stuff, such as paths and
> parameters. It is never the case that I need both of this worlds at the
> same time: I either need a generic URL that has a password but doesn't have
> path or params, or I need a HTTP-like URL, where I do not need a password.
> So if we think about strong types with strong invariants, I never get as
> strong a type as I would need. For instance, I would never want to pass the
> password in URL when using Boost.Beast, so having member set_password would
> only be a potential for bugs.

Password can in fact appear in HTTP. This is a valid CONNECT request:

    CONNECT user:pass_at_[hidden] HTTP/1.1\r\n\r\n

Since that request-target is not a valid URL, you would need to call
`parse_authority` instead. The return value from that function is
result<authority_view>.

> Am I suggesting anything? The only use case that I have for this library is
> HTTP-like URLS. I would appreciate a http::url class that is tailored for
> http or even REST protocols.

Aside from offering the password, which could be restricted with
business logic instead of a new type - I feel that
boost::urls::url_view and boost::urls::url are pretty well suited for
HTTP and REST. The hierarchical and structured interpretations of path
and query and provided as separate containers with reference
semantics, so they do not burden users of opaque schemes. There's a
balance that is struck here - users are offered containers which works
for all URLs, and while some of the API may not be of use to your
particular use-case, there is some value having a vocabulary type
which can be trafficked universally regardless of its contents.

> Regarding the docs, while I agree with others that some examples of use
> cases, such as percent encoding, would be desired, I also note that the
> documentation stands up to the high standards of of Boost libraries:

Well thanks for the kind words! But... well.. we still have work to do
there. Thank you though!

> I have two recommendations, though. One, the picture that associates IQs
> with facial expressions: I strongly recommend that it is removed. It does
> not bring any technical value, but it seems to be sending a hostile message.

LMAO... I only put that in for the review, for some levity. It would
of course not be in the final library.

> Two, the colouring of different parts of URL syntax, as in
> https://master.url.cpp.al/url/containers.html#url.containers.generic_syntax:
> it help to quickly grasp things, but it also bring some confusion. For
> instance, I cannot tell if the double-slash is part of authority, or if the
> colon is part of the scheme. The colors suggest that they are, but the
> member functions tell otherwise.

Well it is using the ABNF syntax. We could state that explicitly, and
point out that the colon, double slash, question mark, and pound sign
are delimiters which separate the parts and are not considered to
belong to their respective element.

> I didn't have time to play with path and param views. They seem fine on the
> surface, but I have a feeling that they would be clumsy to use in practice.
> I observed that params allow duplicate keys? Is it legal?

Yeah, duplicates are allowed. I think the path views are pretty OK.
I'm not sure about params. I mean, we need the container and it should
be iterator based, and it has to provide the key and value. But I
don't know if "bool has_value" is best, or if we should use an
optional, or if we should use `pair<string_view,
optional<string_view>>` as was suggested. It would be nice if people
with actual need for processing query params could kick the tires and
weigh in on this.

> Finally, I want to thank the authors for writing and sharing this library.
> It is clear that a lot of effort and thought has been invested in it.

Thank you for taking the time to write the review!

Regards


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk