Boost logo

Boost :

From: Ruben Perez (rubenperez038_at_[hidden])
Date: 2022-05-19 21:10:06


On Thu, 19 May 2022 at 00:10, Christian Mazakas via Boost
<boost_at_[hidden]> wrote:
>
> Hello Boost Mailing List, this is my review of Boost.MySQL,

Hi Christian, I'm happy to see that you found time to write your review.
Thanks also for supporting the idea of proposing this library to Boost,
it wouldn't be happening without it.

>
> ------------
> Introduction
> ------------
>
> Full disclosure, Rubén initially approached me and asked if a MySQL Asio
> implementation would be wortwhile and I emphatically said, "Yes!" so in some
> sense, it's impossible for me to be unbiased here.
>
> I strongly vote in favor of accepting this library into Boost, for the
> following
> set of reasons:
> * it actually works
> * it's incredibly idiomatic Asio code
> * it enables developers to meet business needs
> * everything else can be built on top of it
>
> ----------
> Background
> ----------
>
> I was one of the earlier users of this library, using it going back probably
> about a year or so ago. I used it at my last job to store records in a
> local DB.
> There's probably still a tractor somewhere in a tomato field in California
> that's running a version of Boost.MySQL on it.
>
> Originally, our goal was to have replication working between our deployed
> machines and a remote server that stored everything "for real". But
> unfortunately, that requires a bit of DBA voodoo that I'm not capable of so
> I
> settled for implementing my own replication logic for the tables where we
> had
> to preserve the data.
>
> It should also be known that I guess I fall into the "advanced" category of
> Asio
> user though that is somewhat contentious. I can write an iniating function
> okay
> and preserve executor/allocator correctness.
>
> I've also written my fair share of SQL queries as well, working as a webdev
> for
> 5+ years so I'm no stranger to MySQL.
>
> ---------------------------
> Discussed Issues & Concerns
> ---------------------------
>
> I've seen some recurring themes during review. Namely, no connection pool,
> no
> direct parse-to-struct, no fail-over/healthcheck support.
>
> As far as a connection pool is concerned, this should, in my opinion, be
> considered out-of-scope for an initial acceptance candidate, especially
> considering the work that was already put into the candidate we have today.
>
> It is nice to have one but it's also an open-ended design space and if one
> was
> included in the implementation, we'd all be bikeshedding it to death.
> Instead, I
> think it's better to let users write what works for them and then gradually
> we
> see what a community-driven approach looks like and add that to Boost
> officially
> at a later time.

Some user input on use cases here would definitely help.
https://github.com/anarthal/mysql/issues/19 tracks the issue, though.

>
> It's also worth noting that if you're running against a localhost DB,
> creating a
> unix socket connection on something like `/var/run/mysqld/mysqld.sock` is
> essentially free and you won't even need to use TLS. This helps ameliorate
> some
> of the need for a connection pool.
>
> Now, in terms of a health-check, this is something the library could add to
> its
> `connection` class and have it function identically to how Beast has its
> websocket streams with their automatic ping/pong semantics.
>
> I also saw in a previous email the idea of `void connection::ping()`. I
> think
> this should be added before acceptance. If we don't add automatic
> health-checks
> for a connection, we should at least give users a proper tool for solving
> that
> problem themselves.

https://github.com/anarthal/mysql/issues/62 tracks this. Just as a note,
I have been playing with the official MySQL C API and with the mysql
executable and I've grepped its source code and they don't seem
to automatically use the ping command. They expose the protocol function
as I proposed, too. They add an option to automatically reconnect the
client when calling ping() (and any other function). That could be
interesting but again, I don't think it's a common Asio pattern.

>
> I will say, however, that it's actually common in Asio to handle
> reconnection
> and health-checking yourself. For example, I've delivered Asio-driven
> applications that communicated with serial ports and CANbus (using
> SocketCAN).
> These connection types had reconnection and timeout logic that I manually
> implemented. These things seem daunting at first but become second-nature
> once
> you're accustomed to Asio. Even moreso now that Asio's awaitable types
> implement
> `operator||`.
>
> As for fail-over, I can't comment too directly as it's something I've never
> had
> to deal with in my career. On first thought, I don't know if it'd be
> exactly the
> most difficult thing to get a connection to realize it's disconnected and
> then
> reconnect to another database from a known whitelist. I'm also under the
> impression that services like AWS' RDS will simply give you a static
> hostname
> or IP address to connect against which acts as a proxy and then the
> fail-over
> happens in the background. Maybe I misunderstand the intent of what
> "fail-over"
> is so I could simply be uninformed with an opinion here.

>From what I've read about this topic (I'm far from an expert here,
unfortunately), MySQL/MariaDB implement the following replication
options:

* Basic replication (MySQL and MariaDB). This is a source/replica
architecture, where a single source node updates multiple replica nodes.
This is the oldest form of replication, failover must be explicitly invoked
via a CHANGE REPLICATION SOURCE TO command, and implies
that clients must know each specific node and failover manually.
Docs: https://dev.mysql.com/doc/refman/8.0/en/replication.html
* Group replication (MySQL). This features a group of MySQL servers
and a membership service that manages the group automatically.
There is usually a single primary server (handling reads and writes)
and several replica servers, handling only reads (although other configurations
are possible). The membership service gives you automatic failover
(i.e. when a node fails, it's marked as failed and removed from the group.
If it was a primary, a new primary is automatically elected). However,
clients need to know the available nodes and switch manually to a new node
if the one they were connected to becomes unavailable.
Docs: https://dev.mysql.com/doc/refman/8.0/en/group-replication.html
* InnoDB cluster (MySQL). This is built on top of group replication,
and provides a higher level abstraction - it's easier to manage.
As a client, you still need to know which host to connect to and
fail over manually.
Docs: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-innodb-cluster.html
* InnoDB ReplicaSet (MySQL). This is similar to InnoDB cluster, but built
on top of basic replication. In general, documentation suggests that we
should prefer InnoDB cluster over InnoDB ReplicaSet.
Docs: https://dev.mysql.com/doc/refman/8.0/en/mysql-innodb-replicaset-introduction.html
* MySQL router (MySQL). This is essentially a load balancer that can sit
before InnoDB clusters or InnoDB ReplicaSets, to which clients can connect to.
The router exposes the same protocol interface as a regular MySQL server, and
forwards packets to the actual servers in the cluster. There are
usually two open
ports in the router: one for read/write connections (which should be forwarded
to primaries) and other for read-only connections (which should be forwarded to
secondaries). This can be achieved by this library without problems.
When a server fails over, the router closes the connection to the client.
The client should then attempt to re-open the client to the router, as if a
network disconnect took place. The router will then forward traffic to a
different MySQL server.
To sum up, users using MySQL Router need only implement reconnection
logic, and will get failover handling for free.
Docs on failover:
https://dev.mysql.com/doc/mysql-router/8.0/en/mysql-router-general-using-developing.html
Docs on router: https://dev.mysql.com/doc/mysql-router/8.0/en/
* NDB cluster (MySQL). This uses a different storage engine (NDB vs.
traditional InnoDB), and can be accessed either as a regular MySQL server
(i.e. using this or any other library) or a dedicated API (which should be
more performant). It doesn't seem possible to put a MySQL Router instance
before one of these clusters, so manual failover would be required.
* Galera cluster (MySQL and MariaDB). This is third-party software,
but seems to have been adopted as the default by MariaDB. Similar
to InnoDB cluster.
Docs in MariaDB: https://mariadb.com/kb/en/about-galera-replication/
Site: https://galeracluster.com/
* MariaDB MaxScale. This is similar to MySQL router, but for MariaDB.
I don't know what MaxScale clients see when the server they are connected
fails over, but it is likely that it works similarly as in MySQL Router.

If you're using the official C API, you've got two options:
* mysql_real_connect(), which is given a hostname and a port. This will
perform regular hostname resolution and then connect to each returned host
(same as boost::asio::connect() free function).
* mysql_real_connect_dns_srv(), which is given the name of a DNS SRV
record. It will retrieve a set of hostnames from that DNS query and then
call mysql_real_connect() for each of these hosts. This could be a nice-to-have
for replication, although I don't know if Asio supports DNS SRV queries.
This is a pretty recent addition to the API.

Disclaimer: I haven't tried these scenarios, only read the official docs.
A deeper investigation in this topic, as well as some documentation, would
be nice. I've raised https://github.com/anarthal/mysql/issues/93 to
address this.
I'd say an example on reconnection would be nice, and raised
https://github.com/anarthal/mysql/issues/92 to address it.

>
> For parsing directly into a user-defined struct, this is always something
> nice
> to have but it shouldn't be a requirement as C++ offers no standardized
> reflection interface so it's unreasonable to expect this library to
> accomodate
> that. This was similarly debated at length during the JSON review and we've
> learned that a library can still be good while not parsing directly to a
> structure.

My idea here would be adding support for std::tuple and structs using
Boost.Describe, as explained in https://github.com/anarthal/mysql/issues/60.

>
> -----------------
> Personal Opinions
> -----------------
>
> There's a school of thought that only the lowest-level components should be
> standardized on a first-pass. Higher-level components can always be built
> on top
> of solid primitives. I think Boost.MySQL creates this set of primitives and
> I
> think it succeeds.
>
> From a connection, you derive prepared statements or queries. From those,
> you
> derive a resultset which is an interface for reading what the DB writes
> back.
> Rows derive from the resultset and then values derive from the rows. The
> surface
> area of the library is delightfully small and well-contained.

I've had some discussions on the possibility of making prepared_statement
and resultset not be I/O objects, and moving their I/O functions to the
connection object. Doing this, you'd have simpler prepared_statement
and resultset types, and a type-erased connection object would be
simpler to implement. The interface would roughly go from the current:

prepared_statement::execute(const params<ValueIterator>&)
prepared_statement::close()
resultset::read_one(row&)

To these ones:

connection::execute_statement(const prepared_statement&, const
params<ValueIterator>&)
connection::close_statement(const prepared_statement&)
connection::read_row(resultset&, row&)

Do you think this interface would still be intuitive for the user?

>
> I think this library is "simple", for better or worse. I empathize with the
> side
> that expresses woes about this simplicity. I think perhaps I'm hardened by
> my
> own experiences with Asio so the pains of this library are nothing new to
> me. To
> this end, the simplicity of the library is a welcome breath of fresh air
> from my
> perspective. The for example, is actually small which is incredibly rare
> these
> days.

Thanks. That was one of my design goals.

>
> There is one other issue I'd still like to discuss and it's the infamous
> usage
> of `std::vector` in interfaces. This is not a problem as a convenience
> interface
> as 98% of users are going to use `std::vector` anyway. But it does inhibit
> Allocators which is unsuitable for acceptance as a Boost library. Everywhere
> `std::vector` appears, there should be an accompanying version that works
> with a
> `Vector&` out-param, i.e.
>
> my_company::vector<mysql::row> v;
> co_await resultset.async_read_all(v);
>
> I don't think requiring `vector`-like operations on the type is problematic
> but
> preventing control over the allocation is.

This will definitely be addressed before the library gets into Boost.
https://github.com/anarthal/mysql/issues/58 tracks it.

>
> `error_info&` as an out-param is slightly awkward but I do appreciate that
> it at
> least enables us to read the messages about why our queries are wrong. This
> was
> my only real complaint about the ergonomics of the library. I would've
> expected
> to somehow pull this information from the `error_code` so using an
> `error_info&`
> out-param is clunky here.

That's a good shout. I don't like it either, but haven't found a better way to
transmit a server-provided error message back to the user. Other than storing
it in the connection and then calling something like GetLastError(),
but it feels
too "plain-old-C", and may be problematic for stuff like pipeline mode.
Supporting an extra overload for every async function is also a pain.
Any ideas are welcome.

>
> I also don't particularly mind that the backing data of a `value` is tied
> to the
> lifetime of a `row` but I can see why this is confusing so perhaps a change
> in
> naming is required.

https://github.com/anarthal/mysql/issues/85 tracks it,
proposed names are field (for owning values) and
field_view (for non-owning ones).

>
> ---------
> Summation
> ---------
>
> In summary, the library is great, it's intuitive, it's idiomatic and I
> think it
> belongs in Boost aside other great Asio-based libraries like Beast. At some
> point, Boost will have its own LAMP/MEAN stack which is the dream.
>
> I also realize, I never critiqued the implementation and I think that's
> because
> parsing/serializing perf isn't a priority for me and simply working
> correctly
> and being clean under valgrind are all that matter so long as the library
> isn't
> horrendously slow.
>
> If anyone has different opinions on my take, I'd love to hear them.
>
> - Christian

Thank you.

Regards,
Ruben.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk