Boost logo

Boost :

From: Seth (bugs_at_[hidden])
Date: 2022-08-22 02:57:10


Disclosure: I've followed the evolution of the library with some interest on
cppslack. Hence I may be consciously/sub-consciously primed to see some of the
choices that went into the library as "sensible".

# What is your evaluation of the potential usefulness of the library?

It's useful especially for applications dealing with network (Asio, Beast, Mysql) but obviously beyond

# What is your evaluation of the design?

I won't call it user-friendly, but seems to achieve the goals that it was set
out to achieve well, which means focusing on versatility and control,
inevitably sacrificing some ease-of-use.

Like some of the earlier reviewers, I also have some concerns over introducing
non-owning vocabulary type. I do appreciate the design choice though, because I
think it ultimately fits the Boost ecosystem best - favoring
performance/versatility over ease of use.

In contrast to others, I think the choice of Nomenclature, namely URL
everywhere _except_ in the grammar domain (the parse function names) is a very
elegant one: it avoids the historical naming swamp (by choosing the URL camp)
while still making it abundantly clear what parts of RFC are referenced.

Other libraries don't always expose accurate terms (`authority`, `URI
reference`) and it leads to confusion, e.g. python3:

    In [1]: from urllib.parse import urlparse
    In [2]: urlparse('localhost:5555')
    Out[2]: ParseResult(scheme='', netloc='', path='localhost:5555', params='', query='', fragment='')

Whereas Boost URL allows the user to specify the intended semantics

 - parse_uri("localhost:5555") gives {scheme="localhost", path="5555"}
 - parse_relative_reference("localhost:5555") gives an error (leftover)
 - the RFC specifies[1] how to fix the input to allow `:` to appear in the
   first elements of a relative-path reference:
    parse_relative_reference("./localhost:5555") gives {path="./localhost:5555"}

I do feel that `pct_encoded_view` is an unfortunate choice for a view that
actively decodes URL-encoded characters. I'd prefer `pct_decode_view` or
`pct_decoding_view`. Views names should probably describe the semantics of the
view itself, not what kind of underlying sequence is being adapted/transformed.

I found `resolve` a bit mystifying. While I appreciate the docs not trying to
duplicate the RFC specifying what the operation means, I think it will be a big
help to have inspirational examples, like example one that I came up with a
while ago:

    boost::url make_url(boost::url_view base_api, boost::url_view method) {
        assert(!method.is_path_absolute());
        assert(base_api.data()[base_api.size() - 1] == '/');

        boost::urls::error_code ec;
        boost::url url;
        resolve(base_api, method, url, ec);
        if (ec)
            throw boost::system::system_error(ec);
        return url;
    }

    // example of use:
    static boost::url_view const base_api{"wss://api.binance.com/api/v3/"};

    boost::url method{"depth"};
    method.params().append("symbol", "BTCUSDT");

    std::cout << make_url(base_api, method); // "wss://api.binance.com/api/v3/depth?symbol=BTCUSDT"

On a sidenote, `resolve` is one of those API's where I appreciate that it is a
free function (so it applies equally well to all `url_view` compatible
containers) as well taking an output argument for the result.

Like with `encoded_view::assign_to` I believe it is the most elegant way to give the
caller control over allocation and reuse.

It would be welcome to _also_ have `resolve()` member functions for discoverability.

Regarding the parser interface, I feel this should mature into its own library.
I don't see myself using it currently, and would consider it a pure
implementation w.r.t. of Boost URL. I have not reviewed this section of the
library and only skimmed the docs.

# What is your evaluation of the implementation?

Didn't look at it too much, though I found some observations I'll list below.

# What is your evaluation of the documentation?

The documentation can use a little work in the narrative department. On the
plus side, some pages already incorporate some "real life" examples. The
brevity makes it easy to overlook. Perhaps the examples would be better when
fleshed out in long form, then linked to.

I like the diagrams and some example tables. I think I might be referring to
the diagram in the future when discussing/explaining the behaviour of URLs or
Boost URL in particular.

# Did you try to use the library? With what compiler? Did you have any problems?

I have used the library earlier e.g. when writing Beast code and answering StackOverflow

# How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I've been eyeing the library for some time while it was "baking".

I spent ~4 hours gathering these review comments today, using the library
building from the boost super-project using both CMake and b2 (linux).

In some departments I guess I had the advantage of prior knowledge, making my
review more in-depth in places.

# Are you knowledgeable about the problem domain?

I'm not a notable domain expert, but I have used my share of URL classes in C#,
Java and .NET. Also, have implemented parsers for query strings in the past.

# Do you think the library should be accepted as a Boost library?

I vote to ACCEPT the library. Although there are things to be improved, I
wouldn't consider any of them conditional.

Thanks Vinnie and Alan for putting time into creating and honing this capable
library.

Seth Heeren

-----

A list of observations encountered when trying out things and reading through the code:

 - The current implementation trips a fair number of deprecation warnings in
   Boost Filesystem - I trust those are probably known and easily addressed
 - In Containers docs: "URL reference. Missing scheme and host." - should say
   "scheme and authority"
 - "Although URL query strings are often used to represent key/value pairs,
   they are not a compound element" - Is confusing to me. It seems to say
   key/values are not supported, but it goes on to say `url_view::params`
   extracts "this view of key/value pairs".
 - Defining BOOST_URL_NO_SOURCE_LOCATION does not compile the tests (possibly more?)
 - message for `error::success` returns arbitrary value:
    
    #4372 error.cpp(53) failed: 'BOOST_URL_ERR(error::success).message() == "success"' ('bad alpha' == 'success')

 - Running the tests is very noisy with console litter, making it hard to find
   actual failure info

 - "Exceptions only thrown on excessive input length" - what is excessive? Is
   that restriction useful? Since parsing doesn't allocate in the first place,
   at the time of parsing any "excessive" allocation will already have existed.
   Any limit would usefully apply on conversion to `urls::

 - Throw location on parse error appears to always be the call site if
   `.value()` is allowed to throw. While technically accurate (the _throw
   location_ vs _error location_) it does diminish the value of the error
   information if people opt to handle errors by exceptions. This might be more
   of a Boost System concern.

 - That said, source location isn't always set - probably some spots missing BOOST_URL_RETURN_EC magic?

        auto rr = parse_relative_ref("foo:bar");

        if (rr.has_error()) {
            std::cout << "Error: " << rr.error().location() << "\n"; // "(unknown source location)"
        }

[1] https://datatracker.ietf.org/doc/html/rfc3986#section-4.2:~:text=Such%20a%20segment%20must%20be%0A%20%20%20preceded%20by%20a%20dot%2Dsegment


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk