Boost logo

Boost :

From: Ruben Perez (rubenperez038_at_[hidden])
Date: 2022-05-18 23:23:33


Forwarding this to the Boost ML (you just replied to me).

On Mon, 16 May 2022 at 22:29, Alan de Freitas <alandefreitas_at_[hidden]> wrote:
>>
>> It is possible to write a Postgres connector with a similar approach.
>>
>> I indeed wrote a POC available here:
>> https://github.com/anarthal/postgres-asio
>> The protocols are different, though, so if you follow my approach
>> of providing primitives easy to use, yet close to the protocol,
>> the APIs will be a little different. The Postgres protocol allows
>> finer grain control over statement execution, AFAIR.
>
>
> Oh... That's great.
>
>> What actions do you think we can take here? Adding a section in the intro
>> (https://anarthal.github.io/mysql/mysql/intro.html) about "knowing Asio
>> is a requirement to use this library"? Do you think an overview on the protocol
>> (handshake, query, metadata and row packets) would really be useful
>> for the user at that point?
>
>
> Considering the other reviews I've seen, I think people would like something like "Here's how you can use the library (with synchronous examples, so the user knows they don't need Asio to do what other libraries do)" and only then move to something like "This is how you can use it asynchronously (and by the way you need Asio to do that)". I think the Boost.Process documentation does something like that.

I will move the examples on prepared statements forward, so all library
functionality is explained before going into the async world. I may also
refactor it to try to hide the ssl::context and io_context boilerplate.
Besides that, I've got the following example suggestions:
https://github.com/anarthal/mysql/issues/87
https://github.com/anarthal/mysql/issues/61
https://github.com/anarthal/mysql/issues/65

Any other ideas are welcome.

>
>>
>> I'm as stuck as you here. What would be the best path here?
>> For now, I've tried to contact a former user in the Boost ML who works
>> at Oracle, to see whether he can provide me some guidance here.
>> Any ideas are welcome.
>
>
> I don't know if being better safe than sorry counts as a solution, but maybe being defensive and trying to come up with new names? Besides the legal issues, which I don't understand, there's the advantage of reflecting it also supports MariaDB.

Our contact in Oracle has forwarded our query to the legal department,
and I'm waiting for the response. I will wait for that before taking any
action.

>
>>
>> At the end of the day, I'd say this is similar to HTTP (half-duplex).
>
>
> Yes. That's correct. I guess the main difference is only the use case.
>
> An HTTP server might spend its time serving multiple clients asynchronously because they're different connections.
> But this same server might also be a MySql client and these SQL requests will need to be serialized to write back to the HTTP clients.
> So the database becomes a shared resource being used by a number of asynchronous operations, even though HTTP and MySQL are individually similar.
> But this is just something to think about.

I guess this is a similar pattern to having the HTTP server calling
another HTTP API while serving a request. The user will need to
have a number of separate database connections to be able to scale.
Connection pooling (https://github.com/anarthal/mysql/issues/19) should
be important for this pattern. Using async functions for the database
enables the server to keep serving HTTP requests (or whatever
other processing it needs to do) while waiting for the MySQL server
to answer a query.

>
>>
>> Brought up by another user during review, tracked by
>> https://github.com/anarthal/mysql/issues/58. I think the best thing
>> here is just passing the vectors by lvalue reference.
>
>
> I would think something like asio::buffer as input and size_t as a result, like asio::read functions, would be better because it allows the preferred data structure to be used.

I'd say this pattern is effective for asio because it needs the byte
elements in the output buffer to be contiguous. We may be able to
use a concept like RowOutputIterator to achieve the same thing.

I'm also thinking on making these functions return a custom range
that doesn't own the rows (as proposed by Vinnie), but I need to
study it more before making a decision.

>
>>
>> Not sure if I follow you. What's the interface you propose here?
>> The connection object already holds an internal buffer for requests.
>> The library currently performs boost::asio::read call per packet,
>> but I'm working in optimizing that to use read_some instead.
>> With that approach, read_one may get the row from the buffer
>> without performing any read at all. Is that what you mean?
>
>
> Yes. If it read whatever it can to an internal buffer first, then read_some probably fixes that.

I will have read_one read from the buffer if possible, and
I'm thinking of implementing a read_some function for rows,
but as I mentioned before, I'm still designing.

>
>>
>> If we go down the compile-time parsing strategy, together with the buffering
>> I described above, making it owning may be the best option.
>
>
> Mmm... I hadn't considered that. That's correct.
> Although I do not dislike the idea of it being a "view", because it's the most common use case anyway.
> The only thing I didn't like was a "view" called "value".

I've been given points in both directions. Would naming it
sql_value_view or field_view
make things better? Updated https://github.com/anarthal/mysql/issues/85.

>
>>
>> First of all, I'd like to know your thoughts about the conversion
>> functionality in value
>> (https://anarthal.github.io/mysql/mysql/values.html#mysql.values.conversions).
>> Do you think it's helpful for the user?
>
>
> Yes. This looks good.
>
>>
>> If it wasn't because of conversions,
>> we could replace value::get_optional and value::get_std_optional for
>> something similar to variant2::get_if, returning a pointer to the
>> underlying value,
>> avoiding the problem altogether. This could make even more sense if we make
>> the value owning, as value::get_optional<std::string> would end up copying
>> the underlying string, which doesn't make a lot of sense.
>
>
> Yes. It depends on your next decisions. In any case, it would remove get_std_optional if get_optional is not removed.
>
>> Calling value::is_convertible_to, then value::get will perform the conversion
>> twice, while value::get_optional will do it just once. It isn't
>> horribly inefficient,
>> though - maybe this section is causing more confusion than aid.
>
>
> I'm not sure I'm following. So calling value::is_convertible_to actually needs to perform the conversion and then throw it away?

It does. However, the conversions should be really fast - some int comparisons
in the worst case. No memory allocations or expensive calls.

>
>>
>> 4) Make value::get return a reference instead of a value.
>
>
> You might have some problems keeping the implicit conversions here. I don't know what other libraries do.

That would only be possible if we removed conversions.

>
>> I'm thinking on calling it query_operation or query_result.
>
>
> I'm terrible at coming up with names. But query_operation or query_result look good.
>
>>
>> > - The names "tcp_ssl_prepared_statement", "tcp_ssl_<component>" are quite
>> > repetitive and verbose in the documentation. Maybe "tcp_ssl" could become a
>> > namespace? Isn't that what Asio does for the ssl related objects? Some
>> > names might be reduced if there's no alternative. For instance,
>> > "prepared_statement" might become "statement" since there's no
>> > "unprepared_statement".
>>
>> Would having type aliases under connection (i.e. connection::resultset_type)
>> alleviate your concerns?
>
>
> I think I would try to replicate whatever pattern Asio has. But it's really up to you.

I've updated https://github.com/anarthal/mysql/issues/73 to add
your suggestions.

>
>>
>> Definitely. https://github.com/anarthal/mysql/issues/74 tracks it.
>> How would variadic operator() work with async functions?
>
>
> Yes. I'm not sure it has to be operator(). An explicit name, like execute, could be better. execute might become ambiguous though.

To follow Asio's convention, I'd keep it a function like execute().
Should be possible to make it work, AFAIK.

>
>>
>> Not aware there was a standard format. Could you please point
>> me out to the relevant documentation?
>
>
> I think this is the common template: https://github.com/boostorg/boost-ci
> IRRC, some libraries, such as beast, have two paths: One path with find_package(Boost) when not in boost/libs and one path that considers it to be in boost/libs.
> The template only has the second path.

I haven't been able to locate the template - I can see a couple
CMakeLists.txt but they seem test-related. Could you please
point me to the specific file?

>
>> They are from the perspective that they require a real server to run.
>
>
> Yes. They require lots of things but testing integration it's not their purpose or something specific to them. If their purpose was to test integration (with what?), then the examples would be wasteful because you don't need them to test integration. For instance, https://github.com/boostorg/boost-ci/tree/master/test/cmake_test includes a Cmake integration test, with only what we need to test CMake integration. (This is a minor point anyway)

It could be worth changing the code to only *run* the examples
if BOOST_MYSQL_INTEGRATION_TESTS is defined (as a signal of
"hey, the user has a correctly configured server and wants to run all the tests
she can"), but always build them, regardless of BOOST_MYSQL_INTEGRATION_TESTS.

>
>>
>> > - The documentation refers to topics that haven't been explained yet. Maybe
>> > "value" could be explained after "row", and "row" could be explained after
>> > "resultset" and "resultset" after "queries".
>>
>> The basic concepts have been covered in the tutorial, which goes first.
>> So at this point the user should know the *basic* concept of what a value,
>> a row and a resultset is. The rest of the sections go deeper in these concepts.
>> Did you find section order counter-intuitive, even reading the tutorial first?
>
>
> I don't find it counter-intuitive. I think the mental model of (roughly) values < row < results < statements < DB is clear.
>
> The point here is probably that the snippets mysql/values.html represent stuff the user probably doesn't need and the following sections don't have many snippets.
> At the same time, for instance, you referred to the tutorial when you thought I didn't understand the documentation.
>
> So I have the impression that the model of "values -> row -> results -> statements -> DB" makes it difficult to provide useful examples, while "DB -> statements -> results -> row -> values" makes the documentation more pleasant to read.
> This is probably more important as the documentation grows with time. Technically, both options are fine.

I see your point. Raised https://github.com/anarthal/mysql/issues/90.

>
>>
>> No. In Asio, you always get two overloads for sync operations.
>
>
> Oh... of course. I forgot the subsection was only talking about synchronous functions when I was reading that sentence.
>
>>
>> I'd say this is personal preference. I will explore the tabular format
>> when I have the chance.
>
>
> The text format is not bad if you want to find the C++ type for a SQL type.
> If you're looking for something with a different "column" as a reference, then you have to scan a lot of text to find what you want.

Ah, do you mean adding a table that summarizes the mappings,
additionally to what is currently written? That could be useful.
Raised https://github.com/anarthal/mysql/issues/91.

>
>>
>> Could you please be a little more explicit on your second point?
>> Would adding a struct/tuple interface
>> (https://github.com/anarthal/mysql/issues/60)
>
>
> The tuple interface is something (really) nice to have but I don't think this should be a requirement for acceptance.
>
> On the other hand, the possibility of reusing memory (i.e. not depending on vector on the callbacks) seems urgent at this point.
> This is what asio already does for its functions with asio::buffer so you don't depend on a specific data structure, like vector.
> I think other people have mentioned the same issue in their reviews, and it seems problematic to bake this into the API at this point.
>
> For instance, if we replicate the asio interface, we would have something *similar* to:
>
> std::vector<boost::mysql::row> employees;
> std::size_t n = result.read_all(employees);
>
> I'm not sure this example has the *best* API possible, but Asio functions tend to have this format. For instance, consider Boost.Beast:
>
> std::size_t n = http::read(stream, buffer, res);
>
> https://www.boost.org/doc/libs/1_79_0/libs/beast/doc/html/beast/ref/boost__beast__http__read.html
>
> Same format. The buffers are input parameters and the function returns size_t. And the callback gives you error_code and size_t.
>
> Another small advantage of this format is that the callback function will match those of asio::read/asio::write, which is useful with asio::coroutine.
> https://www.boost.org/doc/libs/1_79_0/doc/html/boost_asio/reference/coroutine.html

Got it. I have to decide which API it will get, but it will allow for
memory re-use.

>
>>
>> and making it owning (https://github.com/anarthal/mysql/issues/85),
>
>
> I don't even think making it own the data is important. I understand the value of making it a view. But that depends on other reviews.
> I just care that something is done about it. Either making it owning or changing the name.

Got it too.

>
>>
>> Kind regards,
>
>
> Thanks again for the library. Great work.
>
>>
>> Ruben.
>
>
>
> --
> Alan Freitas
> https://github.com/alandefreitas

Thank you for your comments.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk