Boost logo

Boost :

From: Ruben Perez (rubenperez038_at_[hidden])
Date: 2023-01-23 13:42:29


Forwarding to the whole ML, I hit the wrong button.

Thanks Marcelo for noting it.

---------- Forwarded message ---------
From: Ruben Perez <rubenperez038_at_[hidden]>
Date: Mon, 23 Jan 2023 at 14:04
Subject: Re: [boost] aedis review starts today
To: Marcelo Zimbres Silva <mzimbres_at_[hidden]>

Hi Marcelo,

I'll try to answer to the several points you made in a single email.

>
> To fix this properly I would need a tool similar to what protobuff is
> to grpc: generate requests and responses stub from a schema.
>
> Aedis however is not trying to do that, it provides the infrastructure
> with which such a tool could be created. Perhaps there is no place in
> Boost for such a library, but it is still necessary.

I do think there is a place in Boost for a library like this. I don't think
we need neither a 100% type-safe solution, nor to completely hide the general
runtime interface that request::push() gives us. But I think we do have some
usability barrier here, and you need to have a plan on how to evolve it to
make it better, before the library gets into Boost and you loose the ability to
introduce big breaking changes.

I don't think the type-safe solution should span *all* Redis commands, but
only the most common ones. If the command the user wants does not have
type-safety implemented, she can always use what it's in place now. Simple
interfaces for simple uses, complex ones for advanced use cases (that's
something I learnt in MySQL the bad way).

> > As an example of this, the library's own example cpp20_containers,
> > in the hgetall function, uses the type std::map<std::string,
> > std::string> for the hgetall response, which will be okay unless the
> > key is not set, in which case you will get an error. I think this
> > has an enormous potential for causing hidden bugs (and even
> > vulnerabilities). I've also noticed that if you mismatch the sizes
> > of the pipeline request and response, you get an assertion (so UB).
>
> To avoid this you can wrap the response type in std::optional, for example
>
> std::optional<std::map<std::string, std::string>>
>
> I don't lock users in this behaviour because they might not want it.
> Perhaps it is desirable to have an error instead of having to check
> whether the optional has a value. The user must know what he needs.

Hand't think of that. Although I don't have data to speak, I get the impression
that this feature will be used unintentionally more than intentionally.
But it might just be an opinion.

> As an alternative I could lock-in users on resp3 types: maps, sets,
> etc. But then how to represent data structures that have no
> corresponding C++ types. It would also be inefficient as it would
> require first copying and then converting.

That's something I didn't take into account when I wrote the review, and that
gets solving this harder. Is there a way, given a type T, to know whether
the response to a given command (say "GET") is compatible with T? Without
performing the actual parsing?

I can think of the following solutions (excuse me if any of these have flaws
I haven't taken into account):

1) A runtime checking option. For the most common commands, you provide a set
of helper functions like request::push_set or request::push_hgetall().
request remembers which commands have been added. The deserializing
function checks for type compatibility before proceeding to the actual
deserialization. This would pick the std::optional<T> vs T mismatch. Not
sure if it's worth paying the price, though. You could provide an option
to disable this at runtime.
2) A compile-time checking option. I envision this as something like
typed_request<commands::get, commands::hgetall, ...>. You can check for
compatibility at compile-time. You will probably want something like
commands::any to provide access to commands that don't have a type-safe
interface. Note that I don't think this should replace aedis::request,
but complement it.

Do you find any of these solutions appealing? Knowing that a strategy is
in-place to evolve the library towards type-safety would alleviate many
of my concerns.

> I am trying to understand how you avoid these risks in Boost.MySql but
> it seems it suffers from the same problems?
>
> I mean this: https://github.com/boostorg/mysql/blob/ac7285c62163c0de40a466ecaeca13f7de5af0c4/example/text_queries.cpp#L108
>
> conn.query("SELECT salary FROM employee WHERE first_name =
> 'Underpaid'", result);
> double salary = result.rows().at(0).at(0).as_double();
>
> The query is not type safe [1] and the result can mismatch the the
> query. Why would it be considered ok for Boost.MySql and not
> Boost.Aedis?
>
> Disclaimer: I am not trying to make a case or open a precedent, it is
> purely technical. I might be wrong but I would like to clarify anyway.

So we have two parts here. The "client -> server" interface has two flavours:
connection::query(string_view), and statement::execute(const ParamsTuple&).
query() should only be used for queries that don't have parameters provided
by the user (I should change that example, BTW). statement::execute takes a
tuple of parameters. The client will check for compatibility in terms of number
of parameters. Types are not checked in the client because the server is
*very* flexible with type conversions.

The second part is the "server -> client" interface (mysql::resultset), which
is the one I think you're referring to here. It's a purely variant-like
interface. Still, there is code to runtime-check that what the server sent
matches the query metadata, which is provided by the server.
I view this variant-like interface similar to Boost.JSON parsing into a
json::value.

I'd say the differences I see from the Aedis case are:

* There is no way for the MySQL client to know the resulting data structure
from a query given the contents of a query. "SELECT product FROM products"
can yield 10 different data types, and without the table definition, you
can't know.
I got the impression this is not the case in Redis - "HGETALL" always
returns the same data structure (please correct me if I'm wrong).
* From a user perspective, when I use a variant-like interface (like the one in
Boost.JSON or in Python database access), my head is set to "type checking is
my responsibility". I know I'm in runtime. When I use a compile-time
interface, I get much more relaxed. I got the "I'm at compile-time safety"
with Aedis, but that impression is not right. Again, this is just based on
what I felt while using the library for the first time, other users might
feel differently. I think if I had encountered an aedis::value, similar to
json::value, I would have switched my mind to "I'm at runtime".
I'm not saying this is the solution, I'm just trying to explain myself.
* I still got criticism for this in MySQL, and I've got a safer compile-time
interface planned (https://github.com/boostorg/mysql/issues/60).
The idea is that you define a row type T, so you can then read a typed
resultset<T> (or individual rows by calling connection::read_one_row(T&)).
My idea is to fully check for type compatibility as soon as I've got the query
metadata available - that would be similar to solution 1) I proposed you
earlier in this email.
* If you're referring to the ability to compose SQL queries in a type-safe way,
as sqlpp11 does, I did determine this as out of scope. As SQL is a standard,
you usually get this at the database compatibility layer, rather than in
individual drivers. I see it as more involved than providing type-safety
to Redis commands - please apologies if this statement is not correct.

>
> This is to make it safe by default and in part because Aedis is
> centered around the adapter. In the same sense that you can write an
> adapter to receive responses in the data-structure that makes most
> sense to the application e.g. std::map, std::unordered_map,
> boost::flat_map etc. You can also customize the way errors are
> handled. For example,
>
> * Errors like type mismatch are usually unrecoverable and you might
> want to fail fast and let the async_ operation itself fail instead of
> proceeding in an unknown state.
>
> * If you can tolerate errors like that you can write an adapter that
> does not communicate an error to the parser. I did not make this the
> default because it looks unsafe, people will usually expect
> async_exec(req, adapt(resp)) to throw when the response can't be
> stored in resp.
>
> I am perhaps at fault here for not documenting this feature. I thought
> however not many people would want this behaviour.

I think you're right and that this is the most common case. I had a look at
the Python redis driver, and they do the following:

* By default, fail the operation if any of the pipeline requests failed.
* If you pass a certain config option, individual errors are reported as
error response objects. So if you sent [C1, C2, C3] and C2 failed, you
will get [C1_response, C2_error_info, C3_response].

I think your default is right. Maybe you could provide the option to report
errors individually by having the user provide
a) std::tuple<leaf::result<T1>, leaf::result<T2>>
Instead of
b) std::tuple<T1, T2>
If the user did b), then do the same you're currently doing, otherwise report
errors individually.

I would like to know your thoughts on these two matters (type-safety and error
handling), as having a plan would alleviate many of my concerns.

Regards,
Ruben.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk