Boost logo

Boost :

From: Seth (bugs_at_[hidden])
Date: 2022-05-18 02:42:54


My review of Boost Mysql

I'll loosely go by the guidelines
[here](https://www.boost.org/community/reviews.html#Comments)

# What is your evaluation of the potential usefulness of the library?

Highly useful.

In fact, my feeling was mostly "where was this library back in 2016". Back then
we had to postpone necessary refactoring with big scalability benefits because
I wasn't ready to do the async initiation function myself, or figure out how
the mysql native client wants to do asynchrony.

# What is your evaluation of the design?

It is largely what I'd expect.

I might consider dropping the `*read_many` interfaces which has been been
subject of discussion (mainly for returning `vector<row>`). I guess they could
also be trivially improved by taking a reference to any mutable range of `row`
objects instead (obviating the need to specify a count).

In my use, I find a "cursor-style" approach to consuming results (reusing
the row buffer) the most natural way to async consume result-sets. Hence
`read_one()` and perhaps `read_all()` would be the only interfaces I like to
see.

I'll probably have some more notes below.

# What is your evaluation of the implementation?

Pleasantly surprised.

A good example is when I zoomed in on the "generic execute" algorithm
implementation. It looks completely readable, which is quite an accomplishment
considering the domain.

Little cues picked up elsewhere (e.g. in tests) confirmed the impression.

This will positively affect both technical the maintainability and chances of
community interest to support that.

Also, I think given a clean state of the implementation, the library could
serve as a inspiration for people wishing to implement other protocols.

# What is your evaluation of the documentation?

The documentation gave me a good idea of what the library does and doesn't do.

I'm not convinced of the "user-friendliness" angle. That's fine for me. It
seems obvious that a library that implements a third-party wire-protocol from
scratch is not focusing on high-level abstraction. It is good to manage the
expectations in the documentation, though.

In particular, a number of features seem more user-hostile.

This is felt e.g. in the (lack) of user-defined conversion support and
interoperability (see below for concrete examples with e.g. `mysql::value`).

I have to mention the statically parameterized class hierarchy. To be clear, I
wouldn't want it any other way, but to end-users it does *not* seem
user-friendly: they have to template a good part of their DB layer on the
chosen stream type if they just want to e.g. optionally support SSL and UNIX
domain socket access. Most ecosystems would consider it standard to have a
connection-string representation with ditto parser that can dynamically spawn
the right implementation types.

I also feel the situation can be improved by providing the conventional nested
type names (see below)

# Did you try to use the library? With what compiler? Did you have any problems?
# How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I looked at this library last year around March, and again this week. Both times
I spent roughly 4 hours exercising the interfaces and accomplishing typical
tasks. This time around I spent also some time reading the docs.

I used my own existing databases (MySql 5.7), mainly asynchronously, focusing
on the `read_one()` 'cursor` approach, several Stream protocols and
simultaneous connections.

I tried some DDL tasks and insertions vs. auto-increment fields.

I used GCC 10 on linux with CMake. I found I had to add the include path, which
doesn't seem mentioned in the documentation:

    INCLUDE_DIRECTORIES(${CMAKE_SOURCE_DIR}/deps/mysql/include/)

The docs (mysql.intro.requirements) gave me the impression that nothing should
have been required beyond:

    TARGET_LINK_LIBRARIES(reviewtest Boost::mysql)

# Are you knowledgeable about the problem domain?

I have experience with MySql, Asio and related libraries.

As alluded to before I've stopped short of creating an Asio-based async
interface for MySql some years ago, when we were migrating a distributed server
application to Asio (making the entirity of RPC paths non-blocking coroutines
except for the database operations). We stopped short lack of time of time and
experience.

We've assessed Amy (https://github.com/liancheng/amy) around that time - which
didn't seem ready for primetime.

Instead I've invested time using mysql client lib as well as MySQL++ wrapper to
get ORM-mapping to domain types from Mysql resultsets in that application. If
anything, I respect that this library doesn't try to provide these
functionalities (do one thing and do it well).

I have experience with using a variety of other databases (mainly Oracle,
MSSQL) in different tech stacks (.NET), which also adds some perspective.

# Do you think the library should be accepted as a Boost library?

I vote for acceptance. I think the library has a good foundation, focus and
implementation quality.

There are areas for improvements. Some of these will involve breaking interface
changes, which would be best addressed before acceptance.

Notes from review process:

 1. The docs/examples need to show common usage patterns like INSERT with
    auto-increment fields. E.g. nothing the use of `last_insert_id()`.

 2. The relational operations on mysql::value have technical meaning only. While
    the docs call this out, it forms a major pitfall.

         struct MyEntity {
             unsigned id;
         } entity{42};

         try { conn.query("CREATE TABLE MyEntity AS SELECT 42 as id"); } catch (...) { }
         auto actual = conn.query("SELECT id from MyEntity").read_all().front().values().front();

         M::value expected(entity.id);
         std::cout << actual << " == " << expected << "? " << std::boolalpha
                   << (actual == expected) << "\n";

    I'd say we need to remove the relational operations whole-sale OR implement them with Mysql semantics

         SELECT CAST(42 AS unsigned) = 42; -- MySQL: TRUE

    If the comparisons are required for internal purposes, they can be provided
    with custom comparator function objects.

 3. The documentation claim "Values do not define `std::hash` yet, as
    `boost::string_view` does not support it" is out of date. `hash_value` has
    been added in 1.69.0

 4. On that note, consider `boost::core::string_view` for better interop
    ("lingua franca" string_view). If you're interested I just did an exhaustive
    side-by-side comparison of features and observable behaviour` which I can
    share.

 5. Others have mentioned multi-line queries. I'm wondering whether they add
    much seeing that sessions are already stateful:

         conn.query("set @greeting = 0x68656c6c6f20776f726c64;");

         std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';
         std::cout << "retained: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';

         conn.query("set @greeting = 0x62796520776f726c64;");
         std::cout << "modified: " << conn.query("SELECT @greeting;").read_all()[0].values()[0] << '\n';

    prints

         retained: hello world
         retained: hello world
         modified: bye world

    In fact it could be enough when `channel<>` does any required queuing.

    Unless queries are tiny and fragmented, there may not be a significant
    benefit. In fact, coalescing queries into a valid multi-line text form leads
    to allocation/copying overhead - and potential new safety risks.

 6. The /one/ area where multi-line can shine is when sourcing hard-coded
    scripts, which begs the question: should a
    `connection::source(path/sqlscript)` feature be added instead?
    
 7. "Finally, note that `prepared_statement`'s destructor does not perform any
    server-side deallocation of the statement. This is because closing a
    statement involves a network operation that may block your code or fail"

    This tells me that probably connections should contain a list of owned
    statement implementations that `prepared_statements` are a handle to.

    This is important when connection pooling, so prepared statements can be
    cleared before returning a connection to the pool. Otherwise this is simply
    a resource leak, and that's a problem.

    (Though some applications might prefer to give each connection in the pool
    a fixed set of statements. This doesn't suit all applications -
    heterogenous connection pools)

 8. [mysql.other_streams.connection] If Windows named pipes is a common
    transport, I'd say providing first class connect/quit support is in order
    
 9. Working with the Stream-parameterized type hierarchy is harder than necessary.

    E.g. `connection` has no inner typedef for `prepared_statement`/`resultset`
    or even the `Stream` template argument. This leaves one with work-arounds
    like:

     template <typename Conn> void my_function(Conn& conn) {
         using Stream = typename std::remove_reference<decltype(conn.next_layer())>::type;
         using Stmt = M::prepared_statement<Stream>;

    In general interface types (arguments/return values of member functions)
    should be directly expressible in code, if not through nested typenames,
    then at least through traits.

 10. I did notice that in the integration tests there is a complete set of
     type-erased wrappers for resultset, connection, prepared_statement etc.
     Perhaps it is worth considering exposing this for users who are looking
     for convenience over raw speed.

    It could of course be that the type-erased wrappers have limitations
    (beyond efficiency) I'm not aware of.

 11. A loose observation: `initializer_list<mysql::value>` is not practical due
     to explicit constructors.

     Of course, `make_values` facilitates that.

     (I'd keep the constructors explicit due to the fact that `mysql::value`
     has some surprising properties that make it bad idea to use as incidental
     type (ref strings and non-obvious relational ops, to name two))

 12. Seeing that Boost Mysql is header-only, what happens with the text
     segments (e.g. the error descriptions)? Is it worth providing alternative
     linkage options - conditional header-only compilation?

     Note: I didn't measure anything here. It was just a thought that crossed
     my mind when I compared to the stock mysql client libs. (On the plus side,
     there's no per-thread `mysql_library_init` call anymore, yay!)

 13. tools/valgrind_suppressions.txt looks promising; I don't see where it is
     used. Is it up-to-date?

 14. I'm not going to repeat some of the observations made by others, re.
     specific interfaces. I'm confident that these are going to be addressed.

     I **do** want to provide counter-point for the recurring theme that
     `mysql::value`'s string element type should be owning. I disagree. To me
     the primary purpose of the `value` type is to be a lightweight view that
     has utility efficiently marshaling to/from the wire format.

     To me it is more important to keep the 99% of use patterns efficient.

     We can instead clearly document that `mysql::value` is not a vocabulary
     type (and why) and not well suited for use outside the context of IO
     operations.

Thanks Rubén for putting your considerable skill and time into this, and for
being very responsive.

Seth Heeren

On Sun, May 15, 2022, at 6:36 PM, Ruben Perez via Boost wrote:
> Thanks to all of you who have found the time to write a review to this point.
> I hope some more will be able to share your thoughts with us.
> The process has already provided me with a lot of useful information.
>
> I've done my best to answer all your messages and questions, but
> the threads are sometimes difficult to follow. So if I've missed any
> of your messages, please let me know, as it hasn't been intentional.
>
> Many thanks,
> Ruben.
>
> On Sat, 14 May 2022 at 10:58, Richard Hodges via Boost
> <boost_at_[hidden]> wrote:
>>
>> Dear All,
>>
>> We've had a good response for calls to review this much anticipated
>> library, and there is a lot of interest on Reddit:
>> https://www.reddit.com/r/cpp/comments/unik91/acceptance_review_for_boostmysql_has_just_started/
>> .
>>
>> In general the public perception of utility libraries like this in Boost
>> seems popular, and Boost's emphasis on quality is often mentioned.
>>
>> If you are able, I would be more than grateful if you could take a few
>> hours over this delightful weekend to offer a review. The more eyes on this
>> project the better, as if it is successful, it will no doubt prompt
>> submission of similar libraries targeting other popular database and
>> messaging systems.
>>
>> The Boost formal review of the MySQL library started on May 10th, 2022 and
>> will conclude on May 19th, 2022 (inclusive) - In fact, I am able to accept
>> submissions up until the (GMT) morning of May 20th as I'll be traveling on
>> May19th.
>>
>> The library is authored by Rubén Pérez Hidalgo (@anarthal in the CppLang
>> slack).
>>
>> Documentation: https://anarthal.github.io/mysql/index.html
>> Source: https://github.com/anarthal/mysql/
>>
>> The library is built on the bedrock of Boost.Asio and provides both
>> synchronous and asynchronous client connectors for the MySQL database
>> system.
>>
>> Boost.MySQL is written from the ground up, implementing the entire protocol
>> with no external dependencies beyond the Boost library.
>> It is compatible with MariaDB.
>>
>> Connectivity options include TCP, SSL and Unix Sockets.
>>
>> For async interfaces, examples in the documentation demonstrate full
>> compatibility with all Asio completion handler styles, including:
>>
>> Callbacks:-
>> https://anarthal.github.io/mysql/mysql/examples/query_async_callbacks.html
>>
>> Futures :-
>> https://anarthal.github.io/mysql/mysql/examples/query_async_futures.html
>>
>> Boost.Coroutine :-
>> https://anarthal.github.io/mysql/mysql/examples/query_async_coroutines.html
>>
>> C++20 Coroutines :-
>> https://anarthal.github.io/mysql/mysql/examples/query_async_coroutinescpp20.html
>>
>> Rubén has also implemented the Asio protocols for deducing default
>> completion token types :-
>> https://anarthal.github.io/mysql/mysql/examples/default_completion_tokens.html
>>
>> Reviewing a database connector in depth will require setting up an instance
>> of a MySQL database. Fortunately most (all?) Linux distributions carry a
>> MySQL and/or MariaDB package. MySQL community edition is available for
>> download on all platforms here:
>> https://dev.mysql.com/downloads/
>>
>> Rubén has spent quite some time in order to bring us this library
>> candidate. The development process has no doubt been a journey of discovery
>> into Asio, its concepts and inner workings. I am sure he has become a fount
>> of knowledge along the way.
>>
>> From a personal perspective, I was very happy to be asked to manage this
>> review. I hope it will be the first of many more reviews of libraries that
>> tackle business connectivity problems without further dependencies beyond
>> Boost, arguably one of the most trusted foundation libraries available.
>>
>> Please provide in your review information you think is valuable to
>> understand your choice to ACCEPT or REJECT including Describe as a
>> Boost library. Please be explicit about your decision (ACCEPT or REJECT).
>>
>> Some other questions you might want to consider answering:
>>
>> - Will the library bring additional out-of-the-box utility to Boost?
>> - What is your evaluation of the implementation?
>> - What is your evaluation of the documentation?
>> - Will the choice of API abstraction model ease the development of
>> software that must talk to a MySQL database?
>> - Are there any immediate improvements that could be made after
>> acceptance, if acceptance should happen?
>> - Did you try to use the library? With which compiler(s)? Did you
>> have any problems?
>> - How much effort did you put into your evaluation? A glance? A quick
>> reading? In-depth study?
>> - Are you knowledgeable about the problem domain?
>>
>> More information about the Boost Formal Review Process can be found
>> at: http://www.boost.org/community/reviews.html
>>
>> The review is open to anyone who is prepared to put in the work of
>> evaluating and reviewing the library. Prior experience in
>> contributing to Boost reviews is not a requirement.
>>
>> Thank you for your efforts in the Boost community. They are very much
>> appreciated.
>>
>> Richard Hodges
>> - review manager of the proposed Boost.MySQL library
>>
>> Rubén is often available on CppLang Slack and of course by email should you
>> require any clarification not covered by the documentation, as am I.
>>
>>
>> --
>> Richard Hodges
>> hodges.r_at_[hidden]
>> office: +44 2032 898 513
>> home: +376 861 195
>> mobile: +376 380 212
>>
>> _______________________________________________
>> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk