Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2022-08-22 16:31:03


On Mon, Aug 22, 2022 at 9:04 AM Phil Endecott via Boost
<boost_at_[hidden]> wrote:
> auto r1 = f1(); // r1 is a string, OK.
> auto r2 = f2(); // r2 is a string, not a string&, so it's safe.
> auto& r2a = f2(); // r2a is a string&, but the user knows that because they wrote a &.
> auto r3 = f3(); // r3 is a string_view, danger!

Ah, yes I see. To me the problem here is auto. In other words, the
issue is not specific to URLs; it is general for any user that uses
auto to capture the result of a function that returns a reference-like
type over its arguments. For example, I see no problems here:

    std::string f1();
    std::string& f2();
    std::string_view f3();

    std::string r1 = f1(); // r1 is a string, OK.
    std::string r2 = f2(); // r2 is a string, not a string&, so it's safe.
    std::string& r2a = f2(); // r2a is a string&, but the user knows
that because they wrote a &.
    std::string_view r3 = f3(); // r3 is a string_view, danger!

Instead of burdening every function that has ever been written, has
been written, and will ever be written with the responsibility of
communicating its return type twice - once in the actual return type
and again in the function name, it seems to me that a more logical
solution is that the author of the code chooses whether to opt-in to
brevity (using auto) or opt-in to clarity (naming the return type).

> I feel that maybe what I'm saying is not getting through:
>
> *I am not insisting that you remove these dangerous functions. You
> are welcome to keep them for those users who feel that they need the
> benefits that they offer. I would just like you to *also* offer
> safe-to-use functions, and to give the safe functions more concise
> names than the dangerous functions

Okay so, if I am understanding you right, you are saying that this
would be better:

    auto u = parse_uri_into_string_view( s ).value();

I'm not sure that I agree. The "name" of a function includes its
return type and parameter list types. `parse_uri` is really
"result<url_view> parse_uri(string_view)`. The user is expected to
know this. If someone calls a function that returns a value, I think
it is reasonable to expect that they know the return type, and the
corresponding semantics of that type.

When a user writes "auto" they are saying that they want to opt-in to
brevity, and that they understand the consequences of what they are
doing. They know what the function return type is, and they are ok
with erasing that information from view. In fact, I would make an even
stronger statement: when a user writes auto, they WANT to erase the
return type from view.

Your proposed renaming of the function (or adding of new functions) is
the equivalent of forcing the return type to be visible again, even
for users of auto, which entirely subverts their intention. As is
currently implemented, the library gives the user the choice of
whether or not to obscure the return type. They can make it explicit:

    url_view u = parse_uri( "http://www.example.com/index.htm" ).value();

or they can make it obscured:

    auto u = parse_uri( "http://www.example.com/index.htm" ).value();

If we rename the function, the user now loses the option of obscuring
the return type:

    auto u = parse_uri_into_url_view(
"http://www.example.com/index.htm" ).value();

Now you might say that you prefer it this way, and that this is a best
practice for you, but I'm not sure that imposing this viewpoint on all
users of the library is a good choice.

> > Yes well, we considered that design and concluded that it wasn't
> > practical. The library has multiple value types for URLs, which one
> > should it return?
>
> One of the safe ones. I'd suggest one that uses a std::string to store
> the URL.

Using std::string as the storage is an implementation detail. We could
do this, and it would enable the use-case where the URL can take
ownership of the string, which could be cool in some cases. However if
we do that then we close the door to also storing any additional
variable-length metadata such as offsets or sizes of the path segments
or query params. We need more field experience and real-world
measurements before we make this change.

> > you think that the library should not let the user parse the substring
> > and return a view to the URL? And instead it should allocate memory,
> > all to protect beginners?
>
> No, I'm NOT SAYING THAT IT SHOULD NOT LET THE USER DO THAT, I'm saying
> that a user who wants to do that should have to do a bit more typing to
> make it clear to the next person who reads their code that they are
> using the unsafe view-returning function.

I wonder if clang-tidy has a setting to warn when someone uses auto on
a reference-like type...

> I'm asking for an option to select either upper case or lower case letters
> in the hex encodings, i.e. %2D or %2d.

Hah.. I knew someone would ask for that. We have the framework in
place (pct_encode_opts struct).)

> *I DON'T WANT THE POINTY ENDS FILED DOWN, I JUST WANT THEM TO HAVE
> DANGER SIGNS ON THEM AND FOR THE TOOL TO ALSO BE USEABLE IN A SAFE WAY.*

I'm not sure how to warn people who use auto variables. Perhaps this
could be something added as a compiler warning or to clang-tidy if it
isn't already there? And perhaps we could propose an attribute for the
language to inform the compiler of reference-like types. Like this
maybe?

    class [[reference-value]] url_view;

and then if you write

    extern url_view get_url_view_from( string_view );
    ....
    auto u = get_url_view_from( s );

the compiler could show the danger sign that you clamour for, if you
specify -Wreference-value.

Now that I think about it Phil, this is quite a good idea because it
solves the problem for everyone, forever, instead of having to have
each individual author solve it over and over again for every piece of
code they write. And users who want to opt-in to brevity can still do
so.

If you want to write this up as a proposal I would be in favor of it
and you could take credit.

> Dave Sankel's talk has a lot about inclusiveness in Rust. Do please view it.

Yes that was quite a good video! Thanks

Regards


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk