Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2022-08-21 20:08:48


On Sun, Aug 21, 2022 at 12:24 PM Zach Laine via Boost
<boost_at_[hidden]> wrote:
> Yes.

Thank you for taking the time to review the library!

> - Is the documentation helpful and clear?
> ...

Oh I very much like feedback about documentation, especially when it
is actionable (i.e. offers specific advice instead of just "should be
better"). I agree with everything you wrote. I do struggle with
documentation, it seems it is never as good as I want it to be no
matter how much time I put into it. We will copy this into an issue
and get to work on it!

> IMO, you should move the bits after the
> bullet list to a section called Compilers and Build Considerations or
> similar, and place it much later in the docs.

Well, the Overview page is designed to tell prospective users of the
library everything they need to know to make a determination on
whether or not the library will suit their needs. And it also serves
double-duty as a promotional page. We want the library to look as good
as possible - showing that we have taken the time to support all those
compilers and build configurations seems to me to be a signal of
quality. We can probably shuffle some things around and refine the
grammar though.

> The text at the top makes the relative/absolute distinction sound
> important, but the rest of the page does not define or show examples
> of the difference between the two.

Yes... the docs need work.

> ...
> Please show this as part of the example, or at least explain it.
> ...

Agree with everything there.

> It would be nice if these were on separate pages. Even nicer would be
> some context, besides just a bunch of code. I had to Google "magnet
> link", and I still don't really know what it is. A brief explanation
> would be helpful.

Yes, we agree with everything. And we need many more small
self-contained examples of doing common things.

> The utl::params API docs refer to it as a random-access view, but it
> is neither random-access, nor a view.

This is what we in the biz call a "bug" in the documentation :) It is
a mutable BidirectionalRange which references the original url's
character buffer.

> When I call parse_uri(), I get a result of type result<url_view>? Why
> isn't the function spelled parse_url(), or the return type spelled
> result<uri_view>? This seems needlessly inconsistent, given the
> assertion in the docs that URL is fine to use in place of URI.

Yes good question, and I think this is explained in the docs. When we
refer to the general family of strings we call it "URL." Containers
are all capable of holding the generic syntax (i.e. ALL valid
productions of the grammar). thus containers are named url, url_view,
and static_url.

However, when we refer to specific grammars from rfc3986 we always use
the exact name of the grammar. Names of parsing functions follow a
consistent pattern: "parse_" followed by the grammar which they
implement:

    // URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
    parse_uri();

    // absolute-URI = scheme ":" hier-part [ "?" query ]
    parse_absolute_uri();

    // relative-ref = relative-part [ "?" query ] [ "#" fragment ]
    parse_relative_ref();

    // URI-reference = URI / relative-ref
    parse_uri_reference();

    // origin-form = absolute-path [ "?" query ]
    parse_origin_form(); // rfc7230

However, they all return the same container, which is url_view,
because url_view is capable of holding all of these things. We briefly
explored baking different grammars into the type system (e.g.
absolute_uri_view) but it becomes totally unergonomic with no
corresponding benefit.

We do have a separate type, authority_view - because an authority by
itself does not satisfy the generic syntax. And of course the function
is suitably named:

    // authority = [ userinfo "@" ] host [ ":" port ]
    result< authority_view > rv = parse_authority( s );

> The assign_to() API seems awkward to me (as in std::string str;
> u.query().assign_to(str);). This would be better as a conversion
> function with a name, like string(), to_string(), or as_string().

I've resisted an implicit conversion to std::string but of course
Ramey Rule kicks in and steers you elsewhere. We are very likely going
to add it, because the API impedance is too high otherwise. So you
will just be able to assign it to your string. And we can trivially
add member to_string() or whatever name is in fashion these days.

> Assignment is fundamental, and the compiler knows what to do with it
> (how to optimize it away, for one thing). Assignment through an
> out-param is awkward at best, and is actually more likely to lead to
> more pessimal code, since RVO cannot be applied.
> The append_to() API is really very nice. Most libraries don't support
> appending as a first-order operation, but it is essential for string
> perf in many use patterns.

Hey thanks for noticing this!

Yes it was entirely our intent to create an API that is suitable for
working with allocators, without having to specify Allocators if you
know what I mean. assign_to and append_to are for reusing the capacity
in existing strings. We have assign_to because the MutableString
concept doesn't specify a way of clearing the string, and we want to
be able to express algorithms without a precondition "the mutable
string must have a zero size." I think the MutableString concept needs
some more work though, we are open to a design discussion on how to
improve it. The idea is to support algorithms that require temporary
buffers to operate.

> However, the containers are frankly a mess. What are they, exactly,
> at a high level of abstraction? For instance, is params a
> singly-linked list? It has a forward iterator, so it kinda looks like
> one.

It is a lazy flat_map without random access but with location-based
insertion and erase. Segments is a lazy list<string>.

> This is important to users who want to use your containers in
> generic code, including the standard algorithms (which often
> have a pessimal implementation specific to forward_iteretors).

Most users should be able to use these lazy containers as-is. For
example they might simply iterate over the params, ignore the ones
that they don't care about, and take action on the ones they want.

> Also, your containers have about 10% of the API
> of any of the kinds of containers listed above, meaning generic code
> will have to special-case use of your container if it's missing some
> of the SequenceContainer or AssociativeContainer API that the generic
> code relies on.

The containers are designed to model as closely as possible standard
containers, where it makes sense to do so. Because they are lazy and
not node based, some concessions have to be made of course. The
priority in all cases is to be lazy (avoid allocation). The escape
hatch is that users who need the exact APIs offered by a standard
container, are free to copy the lazy range into their standard
container and then do everything they need to do. And then they should
be able to assign the result back, since these lazy ranges support
assignment of the entire range.

> I recommend that you implement params in terms of a
> std::vector<std::pair<string_type, string_type>> (using whatever
> string_type you're already using), and then add the API that is
> URL-specific on top of it, and have two members:

Uhh... well... that's not lazy :) and it allocates memory. If users
want this, they have to opt-in to it. I would be reluctant to design
anything that doesn't give the user control over the allocator. But if
you think about it, these lazy ranges give you that. Users can use
standard algorithms like copy with a back_insert_iterator into any
container they want, regardless of allocator.

> std::vector<std::pair<string_type, string_type>> & storage();
> std::vector<std::pair<string_type, string_type>> const & storage() const;
>
> "Remove" is not really a term used in the STL. I think your
> "remove*()" container members should be "erase*()" instead.

I'm not sure where you're seeing this? There are no "remove" functions
in these containers.

>
> The replace() API is nice. I only wish it were range-capable (that
> is, I wish there were an overload taking two iterators to erase, and
> two iterators to insert).

That is something we could do, but there was no obvious use-case for
removing ranges of query params.

> Why did you define your own find_if() and find_if_not(), instead of
> using the std ones? And why the hell does CharSet have these two as
> members? That's crazy!

They are optional members. A particular charset might know how to
implement find_if much better than a generic algorithm. The free
function grammar::find_if will call the member find_if on the CharSet
if it exists, otherwise it uses a standard algorithm. For example
grammar::lut_chars::find_if uses an SSE2 implementation when
available. And it can be further optimized but we just haven't
bothered to do it yet.

> I find the creation of a mini-parsing library and the expectation that
> users will use this mini-library to write their own parsers to be
> somewhat problematic. No one is going to use your mini-library for
> serious parsing work (except perhaps for modifications to the way URL
> does its parsing).

I agree that this is not going to be for most users but if even 1 or 2
users find utility (for example to further validate custom
scheme-specific grammars) then I would be happy with it. I am using
this extensively in downstream libraries, for "serious parsing work"
so I disagree that it is not serious. This is a mini-library for now
so downstream libraries can use it ergonomically, but once it gets
some more practical usage in more varied settings (and becomes stable
API-wise) I would rather propose it as a new library.

> It does have proper error reporting, for one thing.

Did you mean that it "doesn't" have proper error reporting? Where do
you see this lack of reporting (if that's what you meant)?

> I vote to ACCEPT. I think I've raised legitimate concerns about the
> design here, but none of them rises to the level of a blocker.

Thanks!!

Regards


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk