Boost logo

Boost :

From: Alan de Freitas (alandefreitas_at_[hidden])
Date: 2022-08-22 05:43:09


Thank you for the review.

> auto r = parse_uri( s );
>
> There's no indication here that r is a non-owning view and that
> the caller is responsible for keeping s alive for the lifetime of r.
>

Yes. This is a relevant issue here.
This tends to arise with any function that returns a view and it's not
simple to fix. Reference types are contentious and some people still seem
to be against std::string_view.
But people do need them in practice. I imagine many wouldn't like
allocating more memory dynamically in an async server just to decide what
to do with an URL string for which you already have allocated storage.

On the other hand, it sounds reasonable to have a compromise for other
potential use cases.
It's not impossible to come up with other alternatives like parse_uri_view(
s ), urls::views::parse_uri( s ), etc...
I guess what std::ranges does, in practice to deal with that, is put all
functions that return views in the "views" namespace.
The biggest problem is still deciding which of the URL types to return from
parse_uri( s ).

https://github.com/CPPAlliance/url/issues/435

> More fundamentally, I'm not convinced that this allocation-avoiding
> view-rich API style is best for a general purpose vocabulary type
> like URL.

Some defaults where the shorter-named functions return value types are OK.
Not providing the views would be a mistake though.
Most "non-beginner" use cases we know of require views.
We would just be leaving this demand unattended.

> I see that there is a method called persist():
>
> std::shared_ptr< url_view const >
> persist() const;
>
> This function returns a read-only copy of the URL, with shared
> lifetime. The returned value owns (persists) the underlying string.
> The algorithm used to create the value minimizes the number of
> individual memory allocations, making it more efficient than when
> using direct standard library functions.
>
> Hmm, so url_view doesn't own the underlying storage, except when it
> does as a result of this mysterious method. What's the rationale for
> this? (Have I missed some documentation?)
>

Yes. I agree this function might seem confusing.
The types are coherent, but we end up with something that owns the storage
but is still called url_view.

It allocates the memory for the URL and a url_view that references it. The
rationale is to have only one dynamic memory allocation.
It's sort of an advanced function and we have reviewed the documentation on
that a few times.
There's still a lot to improve.

https://github.com/CPPAlliance/url/issues/436

> I would be inclined to
> determine the offsets each time they are needed by the accessor
> methods. In the constructor, parse to validate the syntax but don't
> store any offsets.
>

Exploring ideas to make the object smaller is worth it.
Implementation details can change and there might be better/lower upper
limits for some URL components.

I don't think parsing the URL again every time the user calls an accessor
function is a reasonable solution for the general case though.
Each access to an element would have a linear cost and URLs are allowed be
long in many applications.
Even for small URLs, reparsing it 9 times if the user asks for the
components sounds unreasonable.

> I believe that in the case of the path segments and the query
> parameters, you are not storing offsets (because the number is variable,
> and that would need allocations); why not adopt that approach for the
> other offsets?

The rationale is that these views are accessed through forward or
bidirectional iterators, so the cost is not linear for each access.

> In your non-view url object, you seem to allocate storage using
> new char[]. Why not use a std::string? One benefit of a std::string
> is SBO. Another is that you could provide a move constructor from
> a string.
>

Yes. That sounds interesting.

https://github.com/CPPAlliance/url/issues/437

> I find the names of the encoding/decoding views confusing. I would
> expect a "decoded_view" to refer to underlying encoded data and
> return equivalent decoded data when it is read from, but you seen to
> call that an "encoded_view".
>

I agree. Some uses might be confusing, like in

pct_encoded_view decoded_host = u.host();

Most people who have looked at this have proposed something like
decode_view, decoded_view, pct_decoded_view, or decoded_view.

If we follow the convention used in std::ranges, then it would be
pct_decode_view,
because it's usually a verb. For instance,

https://en.cppreference.com/w/cpp/ranges/transform_view
https://en.cppreference.com/w/cpp/ranges#Range_adaptors

https://github.com/CPPAlliance/url/issues/438

> Is that correct? Again having this third member in the container's
> value_type makes generic code more difficult. I would be inclined to
> use a std::optional for the value instead.

Yes. One option is to use std::optional for the value.
This looks more coherent.

On the other hand, there are applications where an empty string is a
reasonable replacement when the value is empty.
Another is to have a class instead of a struct and so we could encapsulate
the values and return them one way or the other.

https://github.com/CPPAlliance/url/issues/439

> In the past I have had situations where it was necessary to specify

the case for hex letters in %-encoding; please consider adding this to
> pct_encode_opts.
>

Yes. This could be useful.

Uppercase should be the default since this is what URL normalization
requires.

https://github.com/CPPAlliance/url/issues/440

-- 
Alan Freitas
https://github.com/alandefreitas

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk