Boost logo

Boost :

From: Ruben Perez (rubenperez038_at_[hidden])
Date: 2022-05-19 22:10:42


On Thu, 19 May 2022 at 20:47, Julien Blanc via Boost
<boost_at_[hidden]> wrote:
>
> My (small) review of Boost MySQL.

Hi Julien, thanks for taking your time to write your review.

>
> First, i want to apologize for this not to be a real review. I just
> could not find the time to make a proper one, so this is just some
> thoughts that i want to share in the hope they are useful.
>
> ========
> Are you knowledgeable about the problem domain?
> ========
>
> Not really. I've used mySQL in the far past, not from C++, and more
> recently used postgreSQL (via libpq) and sqlite3. I'm not unfamiliar
> with asio, but never used it outside very small projects.
>
> ========
> How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?
> ========
>
> A quick reading. My review is solely based on reading the documentation
> (which, by the way, is pretty good).
>
> ========
> Will the library bring additional out-of-the-box utility to Boost?
> ========
>
> That's a great YES. Definitely useful. I lacked such a library a few
> years ago (was for postgreSQL, but the issue is the same).

Glad to hear.

>
> ========
> What is your evaluation of the implementation?
> ========
>
> Can't tell, i did not look at it.
>
> ========
> What is your evaluation of the documentation?
> ========
>
> Pretty good. I stumbled several times while navigating into a « This
> Page Intentionally Left Blank » page. I have no doubt that this will be
> fixed. Just a few issues i noted :

Could you please open an issue against the repository
pointing out where you found these?

>
> * the documentation on encodings/collation could probably be improved.
> From the current state of the documentation, if i have a table with
> three columns, one in utf8, the other one in utf-16, and the last one
> in iso8859-15, i'm not sure what the returned string_view contents will
> be. Is this the raw data ? And if i'm filtering in a WHERE clause, what
> is the expected encoding of the values ? The connection encoding or the
> field encoding ? Is this different between prepared statements and
> litteral queries ? (i have the intuition that it is, from a quick look
> at MySQL doc).

The MySQL to C++ type reference briefly talks about this
(https://anarthal.github.io/mysql/mysql/types.html). The string_view
contents will be the raw bytes in all cases. In your example,
the first string_view will be utf8 encoded, the second utf16 encoded,
and the third one will be in iso8859-15. The WHERE clause is part
of the SQL statement you're issuing, so it should be encoded using
the connection's encoding (as given by connection_params::collation).

I agree the documentation on this topic is scarce, and it's a common
source of confusion for users. I've raised
https://github.com/anarthal/mysql/issues/94 to tackle this.

>
> * the same goes for date storage and time zone handling. I can live
> with the fact that the library only handles UTC dates, if that's
> correctly handled. I have had very bad experiences with postgreSQL
> trying to be smart and causing more pain than good when handling
> timezones. I don't know how bad mysql is in this regard, but I'm pretty
> sure there are some pitfalls here as well. While i understand this may
> be a bit out of scope for the library, it does provides conversion
> between mysql raw data and a c++ time_point, so it should at least make
> sure that this time_point is an UTC one. Given that some mysql
> date/time types are local or encode a time zone, i expect some issues
> to raise here.

This is explained pretty thoroughly here:
https://anarthal.github.io/mysql/mysql/types.html.

MySQL DATETIME fields represent "naive" datetime objects (similar
to Python's datetimes with tzinfo=None). It's up to the user to interpret
which time zone they are in.

MySQL TIMESTAMP fields also represent datetime objects. However,
these are timezone aware, but the timezone is not stored with the value.
When you insert a value into a TIMESTAMP column, MySQL converts
the timestamp into UTC and stores it as UTC. When you query the field,
by default, it will convert the timestamp back to your local timezone.
"Your timezone", in this context, means the timezone given by the
session-local time_zone variable
(https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_time_zone).
The time_point object you get, in this case, is a local time (and NOT UTC),
and the time zone is given by the time_zone variable. I haven't tried to be
smart and convert to UTC because the time_zone variable can change
programmatically, by issuing a `SET` SQL command, so its contents
can't be tracked reliably.

Full MySQL docs here: https://dev.mysql.com/doc/refman/8.0/en/datetime.html.

It may be worth adding an example on how to work with these types.
https://github.com/anarthal/mysql/issues/95 tracks it.

>
> ========
> Will the choice of API abstraction model ease the development of
> software that must talk to a MySQL database?
> ========
>
> I believe yes, however, i have a few concerns with the API :
>
> * the value variant type has a get_optional member (and the
> get_std_optional variant). I would prefer that it follows the std
> variant get_if pattern. That would also solve the need to handle both
> std/boost optional.

These were chosen because of conversions. So the following works:

value v (uint64_t(200));
auto opt = v.get<int64_t>(); // This will perform a conversion because
200 fits in a int64_t

I need to reconsider this because many of you have raised it.
A get_if interface means that these conversions must go away
(or be reduced to just value::get).

> * i'm not at ease with the fact that value get and get_optional handles
> the same way a NULL value and a type mismatch. I would like to be able
> to write get<optional<int>>(), and get an empty optional if value is
> NULL, and an exception if value is a float. get_if<optional<int>> would
> cause implementation issues, but i think it's just not worth it and we
> can live without (a failure in that case means a database schema and
> program mistake, so an exception is fine here). Since the user would be
> in charge of providing the optional type, that may also solve the issue
> of handling both optional types.

Thanks for offering a possible solution to this problem, as many have flagged it
already. I've updated https://github.com/anarthal/mysql/issues/85
with your proposal.

>
> * for resultset, the async_read_all/many completion handlers takes a
> vector<row>. I would prefer if it were an opaque iterable type (whose
> iteration element is a row). This would probably gives far more room
> for further optimizations without breaking user code.
> * similarly, the row type exposes the fields via a vector of value. I
> would prefer if it would itself be iterable (iteration element being a
> value). IMHO it should also be "visitable", ie something like
> row.for_each(visitor), visitor being a class with overloaded either
> operator()(int field_index, int/float/string_view/...) or
> operator()(string_view field_name, int/float/string_view/...). That
> would also allow gives more room

This will be addressed as part of https://github.com/anarthal/mysql/issues/58,
which will be tackled before inclusion into Boost. I like your idea,
and other users have proposed that I use custom ranges here, too,
which I will likely do.

I'm not convinced of the row visiting mechanism - I would say this is
mixing concerns. I would say the rows should offer a mechanism to access
the values (e.g. by index or iteration), and then the values should be
visitable.

>
> * the library define its own « days » type, and explicitly states that
> it may not be the same as c++20 days. I may have missed the reason
> (they seems equally defined), but i would like it to use c++20 days
> when compiled under a c++20 compiler.

std::chrono::days doesn't specify which integer type is used to
represent the duration, it only states that it is signed. I used plain
int, which fits the purpose, but may be different to std's. The only
possibility here would be to detect if std::chrono::days is available
and define boost::mysql::days differently depending on whether it
is or it is not. But this can then cause ODR violation issues if
a user uses the library under two different C++ standard levels
(possibly without even knowing it, because of a dependency).
That's why this is defined like this.

>
> ========
> Are there any immediate improvements that could be made after
> acceptance, if acceptance should happen?
> ========
>
> I think the API can be improved. I think some API changes would allow
> some internal improvements. However, the library in its current state
> seems already useful. The overall impression from reading other reviews
> is that the quality of the implementation is very good.
>
> My opinion is thus that the library should be ACCEPTED, but it should
> be made quite clear that the library is still in a state where some API
> changes can occur, and any form of source compatibility may be broken
> by update.

Thanks. I will try to address as many of these changes before it is
available to the general public.

>
> And great thanks to Ruben for writing this library, and taking the time
> to get it into boost.

Thank you for your time.

>
> Regards,
>
> Julien
>

Regards,
Ruben.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk