|
Boost : |
From: John Maddock (jz.maddock_at_[hidden])
Date: 2022-05-13 12:02:17
> I believe that modules are so important that you should drop everything and aim to release a modular version of Boost.
Never going to happen - I asked about modules around here recently, and
there was very little interest.
I did as it happens I experimented further, and made Boost.Regex
available as a module: https://github.com/boostorg/regex/pull/174 It
compiles with msvc and nothing else, see for example:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105320.
I also tried with part of boost.math
(https://github.com/boostorg/math/pull/783) and again ran into
intractable gcc and clang issues.
I'm afraid my conclusion is that modules are simply not yet ready for
prime time.
Best, John.
> I believe this because I used Boostâs serialization to write the General Theory of Databases in C++ (I already had that theory in C# and Java). While that succeeded I upgraded I++ to use C++ Modules. I couldnât get Boostâs header based system working with the modular form of databases. So I had to drop Boost altogether.
>
> I believe that if Boost is to remain relevant it must be expressed in Modular Form immediately. You should target a release for a week or two in the future. I sent Boost a starting module of about 15,000 lines of code. It compiles OK but needs work to link. You can pack the entire Boost library into a single .ixx file. You should target Microsoftâs compiler first then expand support as other compilers implement modules.
>
> That is, you should aim to make a quantum leap to ISO C++ 20 standard immediately.
>
> Cheers,
> Benedict Bede McNamara,
> 1st Class Honours, Pure Mathematics.
>
> From: boost-request_at_[hidden]
> Sent: Friday, 13 May 2022 12:24 PM
> To: boost_at_[hidden]
> Subject: Boost Digest, Vol 6704, Issue 1
>
> Send Boost mailing list submissions to
> boost_at_[hidden]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.boost.org/mailman/listinfo.cgi/boost
> or, via email, send a message with subject or body 'help' to
> boost-request_at_[hidden]
>
> You can reach the person managing the list at
> boost-owner_at_[hidden]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Boost digest..."
>
>
> The boost archives may be found at: http://lists.boost.org/Archives/boost/
>
>
> Today's Topics:
>
> 1. Re: MySql review (Ruben Perez)
> 2. Re: Review of Boost.MySql (Ruben Perez)
> 3. Re: Future of C++ and Boost (Andrey Semashev)
> 4. Re: Future of C++ and Boost (Gavin Lambert)
> 5. Re: Boost MySQL Review Starts Today (Alan de Freitas)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 12 May 2022 20:50:22 +0200
> From: Ruben Perez <rubenperez038_at_[hidden]>
> To: boost_at_[hidden]
> Cc: Phil Endecott <spam_from_boost_dev_at_[hidden]>
> Subject: Re: [boost] MySql review
> Message-ID:
> <CACR-md+=90n+Eh1HNCnDQNHcBtfKgYrohg3O-HryL6Wyts8_pA_at_[hidden]>
> Content-Type: text/plain; charset="UTF-8"
>
> On Wed, 11 May 2022 at 20:23, Phil Endecott via Boost
> <boost_at_[hidden]> wrote:
>> Here is my review of Ruben Perez's proposed MySql library.
> Hi Phil, thank you for taking your time to write a review.
>
>> Background
>> ----------
>>
>> I have previously implemented C++ wrappers for PostgreSQL and
>> SQLite, so I have some experience of what an SQL API can look
>> like. I know little about ASIO.
>>
>> I have also recently used the AWS SDKs for C++ and Javascript to
>> talk to DynamoDB; this has async functionality, which is interesting
>> to compare.
>>
>> I confess some minor disappointment that MySql, rather than
>> PostgreSQL or SQLite, is the subject of this first Boost database
>> library review, since those others have liberal licences that
>> are closer to Boost's own licence than MySql (and MariaDB). But
>> I don't think that should be a factor in the review.
>>
>>
>> Trying the library
>> ------------------
>>
>> I have tried using the library with
>>
>> - g++ 10.2.1, Arm64, Debian Linux
>> - ASIO from Boost 1.74 (Debian packages)
>> - Amazon Aurora MySql-compatible edition
>>
>> I've written a handful of simple test programs. Everything works
>> as expected. Compilation times are a bit slow but not terrible.
>>
>>
>>
>> The remainder of this review approximately follows the structure
>> of the library documentation.
>>
>>
>> Introduction
>> ------------
>>
>> I note that "Ease of use" is claimed as the first design goal,
>> which is good.
> I think I failed to make the scope of the library clear enough in this
> aspect. The library is supposed to be pretty low level and close to the
> protocol, and not an ORM. I list ease of use here in the sense that
>
> * I have tried to abstract as much of the oddities of the protocol
> as possible (e.g. text and binary encodings).
> * The library takes care of SSL as part of the handshake, vs
> having the user have to take care of it.
> * The library provides helper connect() and close() functions
> to make things easier.
> * The object model is as semantic as I have been able to achieve,
> vs. having a connection object and standalone functions.
> * The value class offers stuff like conversions to make some use-cases simpler.
>
> I guess I listed that point in comparison
> to Beast or Asio, which are even lower level. Apologies if it caused
> confusion.
>
>> I feel that some mention should be made of the existing C / C++
>> APIs and their deficiencies. You should also indicate whether or
>> not the network protocol you are using to communicate with the
>> server is a "public" interface with some sort of stability
>> guarantee. (I guess maybe it is, if it is common to MySql and
>> MariaDB.)
> Updated https://github.com/anarthal/mysql/issues/50
> on comparison with other APIs.
>
> The network protocol is public and documented
> (although the documentation is pretty poor). It's indeed
> a pretty old protocol that is not being extended right now,
> and it's widely used by a lot of clients today, so there is
> very little risk there.
>
>>
>> Tutorial
>> --------
>>
>> The code fragments should start with the necessary #includes,
>> OR you should prominently link to the complete tutorial source
>> code at the start.
> Raised https://github.com/anarthal/mysql/issues/71
> to track it.
>
>> You say that "this tutorial assumes you have a basic familiarity
>> with Boost.Asio". I think that's unfortunate. It should be
>> possible for someone to use much of the library's functionality
>> knowing almost nothing about ASIO. Remember your design goal of
>> ease-of-use. In fact, it IS possible to follow the tutorial with
>> almost no knowledge of ASIO because I have just done so.
> If you are to really take advantage of the library (i.e. use
> the asynchronous API), you will need some Asio familiarity.
> I'd say a very basic understanding is enough (i.e. knowing
> what a io_context is). If you think this comment is misleading,
> I can remove it. But I don't think this is the right place to
> provide a basic Asio tutorial.
>
>
>> You have this boilerplate at the start of the tutorial:
>>
>> boost::asio::io_context ctx;
>> boost::asio::ssl::context ssl_ctx (boost::asio::ssl::context::tls_client);
>> boost::mysql::tcp_ssl_connection conn (ctx.get_executor(), ssl_ctx);
>> boost::asio::ip::tcp::resolver resolver (ctx.get_executor());
>> auto endpoints = resolver.resolve(argv[3], boost::mysql::default_port_string);
>> boost::mysql::connection_params params (
>> argv[1], // username
>> argv[2] // password
>> );
>> conn.connect(*endpoints.begin(), params);
>> // I guess that should really be doing something more
>> // intelligent than just trying the first endpoint, right?
> The way to go here is providing an extra overload for
> connection::connect. Raised https://github.com/anarthal/mysql/issues/72
> to track it.
>
>> I would like to see a convenience function that hides all of that:
>>
>> auto conn = boost::mysql::make_connection( ...params... );
>>
>> I guess this will need to manage a global, private, ctx object
>> or something.
> If you take a look to any other asio-based program, the user is
> always in charge of creating the io_context, and usually in charge of
> creating the SSL context, too. If you take a look to this Boost.Beast example,
> you will see similar stuff:
> https://www.boost.org/doc/libs/1_79_0/libs/beast/example/http/client/sync-ssl/http_client_sync_ssl.cpp
>
> I'm not keen on creating a function that both resolves the hostname
> and connects the connection, as I think it encourages doing more
> name resolution than really required (you usually have one server
> but multiple connections). I may be wrong though, so I'd like to
> know what the rest of the community thinks on this.
>
>> .port = 3306, // why is that a string in yours?
> It is not "mine", it's just how Asio works. Please have a look at
> https://www.boost.org/doc/libs/1_79_0/doc/html/boost_asio/reference/ip__basic_resolver/resolve.html
>
>> make_connection("mysql://admin:12345_at_hostname:3306/dbname");
> I guess you're suggesting that make_connection also perform
> the name resolution, the physical connect and the MySQL handshake?
>
> I'm not against this kind of URL-based way of specifying parameters.
> I've used it extensively in other languages. May be worth
> reconsidering it once Vinnie's Boost.Url gets accepted.
>
>> Now... why the heck does your connection_params struct use
>> string_views? That ought to be a Regular Type, with Value
>> Semantics, using std::strings. Is this the cult of not using
>> strings because "avoid copying above all else"?
> I may have been a little too enthusiastic about optimization here.
>
>> Another point about the connection parameters: you should
>> provide a way to supply credentials without embedding them
>> in the source code. You should aim to make the secure option
>> the default and the simplest to use. I suggest that you
>> support the ~/.my.cnf and /etc/my.cnf files and read passwords
>> etc. from there, by default. You might also support getting
>> credentials from environment variables or by parsing the
>> command line. You could even have it prompt for a password.
> I don't know of any database access library that does this.
> The official Python connector gets the password from a string.
> I think this is mixing concerns. Having the password passed as
> a string has nothing to do with having it embedded in the source code.
> Just use std::getenv, std::stdin or whatever mechanism your application
> needs and get a string from there, then pass it to the library.
> All the examples read the password from argv.
> Additionally, having passwords in plain text files like
> ~/.my.cnf and /etc/my.cnf is considered bad practice in terms
> of security, I wouldn't encourage it.
>
>> Does MySQL support authentication using SSL client certs?
>> I try to use this for PostgreSQL when I can. If it does, you
>> should try to support that too.
> AFAIK you can make the server validate the client's certificate
> (that doesn't require extra library support), but you still have to
> pass a password.
>
>> About two thirds of the way through the tutorial, it goes from
>> "Hello World" to retrieving "employees". Please finish the hello
>> world example with code that gets the "Hello World" string
>> from the results and prints it.
> My bad, that's a naming mistake - it should be named
> hello_resultset, instead. It's doing the right thing with the wrong
> variable name.
> Updated https://github.com/anarthal/mysql/issues/71
>
>>
>> Queries
>> -------
>>
>> I encourage you to present prepared queries first in the
>> documentation and to use them almost exclusively in the tutorial
>> and examples.
> It can definitely make sense.
>
>> You say that "client side query composition is not available".
>> What do you mean by "query composition"? I think you mean
>> concatenating strings together'); drop table users; -- to
>> form queries, right? Is that standard MySql terminology? I
>> suggest that you replace the term with something like
>> "dangerous string concatenation".
> Yes, I mean that.
>
>> In any case, that functionality *is* available, isn't it!
>> It's trivial to concatenate strings and pass them to your
>> text query functions. You're not doing anything to block that.
>> So what you're really saying is that you have not provided any
>> features to help users do this *safely*. I think that's a serious
>> omission. It would not be difficult for you to provide an
>> escape_for_mysql_quoted_string() function, rather than having
>> every user roll their own slightly broken version.
> Definitely not trivial (please have a look at MySQL source code)
> but surely beneficial, see below.
> Tracked by https://github.com/anarthal/mysql/issues/69.
>
>> IIRC, in PostgreSQL you can only use prepared statements
>> for SELECT, UPDATE, INSERT and DELETE statements; if you
>> want to do something like
>>
>> ALTER TABLE a ALTER COLUMN c SET DEFAULT = ?
>> or CREATE VIEW v as SELECT * FROM t WHERE c = ?
> You are right, these cases aren't covered by prepared statements.
> https://github.com/anarthal/mysql/issues/69 tracks it.
>
>> tcp_ssl_prepared_statement is verbose. Why does the prepared
>> statement type depend on the underlying connection type?
> Because it implements I/O operations (execute() and close()),
> which means that it needs access to the connection object,
> thus becoming a proxy object.
>
>> I have to change it if I change the connection type?! If that's
>> unavoidable, I suggest putting a type alias in the connection
>> type:
>>
>> connection_t conn = ....;
>> connection_t::prepared_statement stmt(.....);
> Raised https://github.com/anarthal/mysql/issues/73
>
>> Does MySql allow numbered or named parameters? SQLite supports
>> ?1 and :name; I think PostgreSQL uses $n. Queries with lots of
>> parameters are error-prone if you just have ?. If MySql does
>> support this, it would be good to see it used in some of the
>> examples.
> Not AFAIK, just regular positional placeholders.
>
>> Invoking the prepared statement seems unnecessarily verbose.
>> Why can't I just write
>>
>> auto result = my_query("hello", "world", 42);
> Because this invokes a network operation. By Asio convention,
> you need a pair of sync functions (error codes and exceptions)
> and at least an async function, that is named the same as the sync
> function but with the "_async" suffix.
>
> I'm not against this kind of signature, building on top of what
> there already is:
>
> statement.execute("hello", "world", 42);
> statement.async_execute("hello", "world", 42, use_future);
>
> Which saves you a function call. Raised
> https://github.com/anarthal/mysql/issues/74
>
>> I also added Query variants where the result is expected to be
>>
>> - A single value, e.g. a SELECT COUNT(*) statement.
>> - Empty, e.g. INSERT or DELETE.
>> - A single row.
>> - Zero or one rows .
>> - A single column.
> I think this can be useful. I've updated
> https://github.com/anarthal/mysql/issues/22
> to track this.
>
>> I don't see anything about the result of an INSERT or UPDATE.
>> PostgreSQL tells me the number of rows affected, which I have
>> found useful to return to the application.
> Please have a look at
> https://anarthal.github.io/mysql/mysql/resultsets.html#mysql.resultsets.complete
>
>>
>> resultset, row and value
>> ------------------------
>>
>> I'm not enthusiastic about the behaviour nor the names of these
>> types:
> Resultset is how MySQL calls it. It's not my choice.
>
>> - resultset is not a set of results. It's more like a sequence of
>> rows. But more importantly, it's lazy; it's something like an
>> input stream, or an input range. So why not actually make it
>> an input range, i.e. make it "a model of the input_range concept".
>> Then we could write things like:
>>
>> auto result = ...execute query...
>> for (auto&& row: result) {
>> ...
>> }
> How does this translate to the async world?
>
>> - row: not bad, it does actually represent a row; it's a shame
>> it's not a regular type though.
>>
>> - value: it's not a value! It doesn't have value semantics!
> If the library gets rejected I will likely make values
> owning (regular).
>
>> I'm also uncertain that a variant for the individual values
>> is the right solution here. All the values in a column should
>> have the same type, right? (Though some can be null.) So I
>> would make row a tuple. Rather than querying individual values
>> for their type, have users query the column.
> Are you talking about something like this?
> https://github.com/anarthal/mysql/issues/60
>
>
>> It seems odd that MySQL small integers all map to C++ 64-bit
>> types.
> It is done like this to prevent the variant from having
> too many alternatives - I don't think having that would add
> much value to the user. If I implement something like
> https://github.com/anarthal/mysql/issues/60,
> each int type will be mapped to its exact type.
>
>> I use NUMERIC quite a lot in PostgreSQL; I don't know if the
>> MySql type is similar. I would find it inconvenient that it
>> is treated as a string. Can it not be converted to a numeric
>> type, if that is what the user wants?
> MySQL treats NUMERIC and DECIMAL the same, as exact
> numeric types. What C++ type would you put this into?
> float and double are not exact so they're not fit.
>
>
>> I seem to get an assertion if I fail to read the resultset
>> (resultset.hpp:70). Could this throw instead?
> This assertion is not related to this. It's just checking that
> the resultset has a valid connection behind it, and is not
> a default constructed (invalid) resultset.
>
>> Or, should the library read and discard the unread results
>> in this case?
> Having a look into the Python implementation,
> it gives the option to do that. I think we can do a better job here.
> Track by https://github.com/anarthal/mysql/issues/14
>
>
>> But the lack of protocol support for multiple in-flight queries
>> immediately becomes apparent. It almost makes me question
>> the value of the library - what's the point of the async
>> support, if we then have to serialise the queries?
> As I pointed out in another email, it's a combination of
> lack of protocol support and lack of library support.
> Apologies if the documentation is not clear in this aspect.
> I think there is value in it, though, so you don't need to create
> 5000 threads to manage 5000 connections. The fact that the
> official MySQL client has added a "nonblocking" mode
> seems a good argument.
>
>> Should the library provide this serialisation for us? I.e.
>> if I async_execute a query while another is in progress, the
>> library could wait for the first to complete before starting
>> the second.
> I would go for providing that bulk interface I talk about in other emails.
>
>> Or, should the library provide a connection pool? (Does some
>> other part of ASIO provide connection pool functionality that
>> can be used here?)
> Asio doesn't provide that AFAIK. It is definitely useful
> functionality, tracked by https://github.com/anarthal/mysql/issues/19
>
>>
>> Transactions
>> ------------
>>
>> I have found it useful to have a Transaction class:
>>
>> {
>> Transaction t(conn); // Issues "BEGIN"
>> .... run queries ....
>> t.commit(); // Issues "COMMIT"
>> } // t's dtor issues "ROLLBACK" if we have not committed.
>>
> Again, how would this work in the async world?
> How does the destructor handle communication failures
> when issuing the ROLLBACK?
>
>> Klemens Morgenstern makes the point that MySql is a trademark of
>> Oracle. Calling this "Boost.MySql" doesn't look great to me.
>> How can you write "The Boost MySql-compatible Database Library"
>> more concisely?
> I'm not very original at naming, as you may have already
> noticed. Using Boost.Delfin was proposed at some point,
> but Boost.Mysql definitely expresses its purpose better.
>
>> Overall, I think this proposal needs a fair amount of API
>> re-design and additional features to be accepted, and should
>> be rejected at this time. It does seem to be a good start
>> though!
>>
>>
>> Thanks to Ruben for the submission.
> Thank you for sharing your thoughts, I think there
> is a lot of useful information here.
>
>
> ------------------------------
>
> Message: 2
> Date: Thu, 12 May 2022 21:02:48 +0200
> From: Ruben Perez <rubenperez038_at_[hidden]>
> To: Dominique Devienne <ddevienne_at_[hidden]>, boost_at_[hidden]
> Subject: Re: [boost] Review of Boost.MySql
> Message-ID:
> <CACR-mdJntpKJ7x0cX8K4M4wasBWqTR5x9i-7FhygHxmCDL4=cA_at_[hidden]>
> Content-Type: text/plain; charset="UTF-8"
>
>> Indeed. But even your imagined batch interface only works well for
>> queries/selects,
>> while for inserts (or updates), the client does not need to just send
>> a small textual
>> SQL query, but potentially a bunch of data for the rows too. A true
>> pipeline allows
>> sending the rows for a second insert while the first insert is being processed.
> It would work for inserts too, as values are either part of the query string
> or part of the statement execute packet. In both cases, part of the request.
>
>> Such a mode may be useful for schema creation OTOH. We have large schemas,
>> with hundreds of tables, indexes, triggers, etc... Done from C++ code
>> client-side,
>> not via a DBA manually executing on the server using SQL files and the
>> native CLI.
>> For that use-case, the ability to send many DDLs in a single batch
>> would definitely
>> save on the round-trips to the server. We try hard to minimize roundtrips!
> Thanks for sharing this use case, I definitely wasn't aware of it
> and seems a reason towards implementing multi-statement.
>
>> You don't need to necessarily use Google's protobufs library.
>> There's https://github.com/mapbox/protozero for example, and similarly, a
>> from-scratch implementation to just encode-decode a specific protocol
>> can also be written.
> I wasn't aware of this.Thanks.
>
>>> The server sends several resultsets after that. I haven't focused a lot on
>>> this because it sounded risky (in terms of security) for me.
>> Not sure what's risky here. Maybe I'm missing something.
> I was just imagining users concatenating queries. May be a misconception,
> yours is a legitimate use case.
>
>> Note that PostgreSQL's COPY is not file-specific, and a first-class citizen
>> at the protocol level, depending on a pure binary format (with text mode too).
>> I use COPY with STDIN / STDOUT "files", i.e. I prepare memory buffers and
>> send them; and read memory buffers, and decode them. No files involved.
> Unfortunately, that does not seem to be the case for MySQL. You issue the
> LOAD DATA statement via a regular query packet, and the server returns
> you another packet with the file path it wants. You then read it in the client
> and send it to the server in another packet. I've made it work with CSV files,
> and I'd say it's the only format allowed, AFAIK.
>
>> Maybe our use-case of very large data (with large blobs), *and* very
>> numerous smaller data,
>> that's often loaded en-masse, by scientific desktop applications, and
>> increasingly mid-tier services
>> for web-apps, is different from the more common ecommerce web-app
>> use-case of many people.
> It is good to hear different use cases than the regular web server
> we all have in mind. It will help me during further development.
>
>> When I evaluate DB performance, I tend to concentrate on "IO" performance,
>> in terms of throughput and latency, independent of the speed of the
>> SQL engine itself.
>> There's nothing I can do about the latter, while the way one uses the Client API
>> (or underlying protocol features) is under my control. So I do mostly
>> INSERTs and
>> SELECTs, with and without WHERE clauses (no complex JOINs, CTEs, etc...).
>>
>> Because we are in the scientific space, we care about both many small rows
>> (millions, of a few bytes to KBs each at most), and a few (hundreds / thousands)
>> much larger rows with blobs (with MBs to GBs sizes). The truly large
>> "blobs" (files) are
>> kept outside the DB, since mostly read only (in the GBs to TBs sizes each, that
>> can accumulate to 2.6 PB for all a client's data I heard just
>> yesterday for example).
>>
>> I'll also compare inserting rows 1-by-1, with and without prepared statements,
>> to inserting multi-rows per-statement (10, 100, 1000 at a time), to the "bulk"
>> interface (COPY for PostgreSQL, LOCAL for MySQL, Direct-Path load in OCI).
>> For example with SQLite:
>> https://sqlite.org/forum/forumpost/baf9c444d9a38ca6e59452c1c568044aaad50bbaadfff113492f7199c53ecfed
>> (SQLite as no "bulk" interface, doesn't need one, since "embedded"
>> thus "zero-latency")
>>
>> For PostgreSQL, we also compared text vs binary modes (for binds and
>> resultsets).
>>
>> For blobs, throughput of reading and writing large blobs, whole and in-part,
>> with different access patterns, like continuous ranges, or scattered inside).
>>
>> A very important use-case for us, for minimizing round-trips, is how
>> to load a subset of rows,
>> given a list of primary keys (typically a surrogate key, like an
>> integer or a uuid). For that,
>> we bind a single array-typed value for the WHERE clause's placeholder
>> to the query,
>> and read the resultset, selecting 1%, 5%, 10%, etc... of the rows from
>> the whole table.
>> (selecting each row individually, even with a prepared query, is terribly slow).
> Thanks for sharing. It will definitely help me during benchmarking.
>
> Regards,
> Ruben.
>
>
> ------------------------------
>
> Message: 3
> Date: Thu, 12 May 2022 23:39:41 +0300
> From: Andrey Semashev <andrey.semashev_at_[hidden]>
> To: boost_at_[hidden]
> Subject: Re: [boost] Future of C++ and Boost
> Message-ID: <db0e5cf4-12b6-cc2f-da78-d65f76f899d2_at_[hidden]>
> Content-Type: text/plain; charset=UTF-8
>
> On 5/12/22 21:48, Marshall Clow via Boost wrote:
>> On May 12, 2022, at 11:34 AM, Robert Ramey via Boost <boost_at_[hidden]> wrote:
>>> On 5/12/22 11:30 AM, Robert Ramey via Boost wrote:
>>>> On 5/12/22 9:55 AM, John Maddock via Boost wrote:
>>>> wow - that would be a big one. codecvt is the fundamental component to support wchar is it not? Does this mean that wchar is gone also? If so what replaced it? etc....
>>> FWIW - I don't see any notice of such deprecation here: https://en.cppreference.com/w/cpp/header/codecvt
>> Its in green - at the top:
>> codecvt_utf8
>> <https://en.cppreference.com/w/cpp/locale/codecvt_utf8>
>> (C++11)(deprecated in C++17)
> Also here:
>
> http://eel.is/c++draft/depr.locale.stdcvt#depr.codecvt.syn
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 13 May 2022 13:20:24 +1200
> From: Gavin Lambert <boost_at_[hidden]>
> To: boost_at_[hidden]
> Subject: Re: [boost] Future of C++ and Boost
> Message-ID: <4d5d8961-85f3-979a-f69d-be9a9f02c3fc_at_[hidden]>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 13/05/2022 06:30, Robert Ramey wrote:
>> On 5/12/22 9:55 AM, John Maddock wrote:
>>> Watch out - all of <codecvt> is deprecated in C++17, I think you're
>>> relying only on <local> and may be OK though...
>> wow - that would be a big one.? codecvt is the fundamental component to
>> support wchar is it not?? Does this mean that wchar is gone also?? If so
>> what replaced it? etc....
> wchar_t is still around (although char8_t and std::u8string are the new
> hotness), it's just the conversion functions that are deprecated. I
> guess you're just not supposed to convert anything any more.
>
> More seriously, AFAIK there's no plans to actually remove it
> until/unless an actual replacement gets standardised. But I think in
> the meantime they'd rather you use something like the ICU library for
> conversions instead.
>
> Although it wouldn't surprise me if, in not wanting to take a dependency
> on an external library (and not wanting to continue using an officially
> deprecated standard function), a lot of libraries/apps will write their
> own subtly-broken conversion routines instead...
>
>
> ------------------------------
>
> Message: 5
> Date: Thu, 12 May 2022 23:24:19 -0300
> From: Alan de Freitas <alandefreitas_at_[hidden]>
> To: boost_at_[hidden]
> Subject: Re: [boost] Boost MySQL Review Starts Today
> Message-ID:
> <CAHpLXUiEEA4zjeMVO_EKgoMTGYv3t49hG0MjOrFqXx93K9SVHg_at_[hidden]>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi! Here's my review.
>
> Boost.MySql is an impressive library. I would like to thank Rub?n for that.
> It implements the complete protocol from the ground up with async
> functionalities and certainly involved a lot of work. The documentation is
> extensive, goal-oriented, and helpful. The async operations support
> cancellation and the examples with coroutines are beautiful. During this
> review, I was impressed at how easy it works and I've noticed a few points
> in the API that have also been brought up in previous reviews and could be
> improved. Some other issues could also be better highlighted in the
> documentation, which would avoid many problems. The overall recommendation
> in this review is acceptance conditional on some fixes.
>
> ## Value
>
>> Will the library bring additional out-of-the-box utility to Boost?
> The library is very good news considering our recent discussions about the
> future of Boost, where providing more protocol implementations comes up
> frequently. I wish more people would make this kind of contribution.
>
>> What is your evaluation of the potential usefulness of the library?
> Others have questioned the benefit of the library when compared to sqlpp11
> or any wrapper around the C API. The main difference is other libraries are
> high-level but this is a discussion still worth having from the point of
> view of users. I was thinking about the transition cost from Boost.MySQL to
> any other SQL database, since many applications have the
> requirement/necessity of allowing different SQL databases. In sqlpp11, the
> user can just change the backend. The user could use Boost.MySQL as an
> sqlpp11 backend and that would have the same effect. However, I think Rub?n
> mentioned this is not possible at some point. I'm not sure whether this is
> just for the async functionality. In the same line, I wonder if a library
> for Postgres or Sqlite would be possible with a similar API, which could
> also solve the problem, although I'm not sure anyone would be willing to
> implement that. If we did, we could have the convenience of sqlpp11 and the
> ASIO async functionalities of Boost.Mysql for other DBs.
>
> The library really provides ease-of-use, when we consider what it provides
> and how low-level it is. However, unlike in other libraries like
> Boost.Beast, Boost.MySql users might not be sold into the Asio way of doing
> things. Applications that require access to databases might be making
> sparse database requests where the Asio asynchronous model is not as
> useful. Highlighting these differences in the docs is important. Asio takes
> some time to learn, and I guess for a user not used to Asio, already
> understanding Asio does not sound like the ease of use. The docs could
> focus on the protocol before moving on to the asynchronous functionality.
>
> I'm also a little worried about the maintainability of the library and
> protocol changes and how this could impact boost as a whole. Should we
> really announce it as compatible with MariaDB? What direction would the
> library take if they diverge? How often does the protocol change or is
> extended? Is Ruben going to stick around if the protocol changes? How hard
> would it be for someone to understand the code and implement extensions?
> Can a user be sure it's always going to provide the same features and be as
> reliable as the C API? I don't have the answer to these questions, but it's
> something that got me wondering. I guess this kind of question is going to
> come up for any library that is related to a protocol.
>
> I don't know if the name "MySql" can be used for the library, as it belongs
> to Oracle. I'm not saying it can't. I'm really saying I don't know. I'm not
> a lawyer and I don't understand the implications here. But this should be
> considered, investigated, and evidence should be provided. The library is
> also compatible with MariaDB and the name "MySql" might not reflect that.
> Maybe there's a small probability it might be compatible with some other
> similar DB protocol derived from MySql in the future?
>
> As others have mentioned, the protocol is strictly sequential for a single
> connection, and this might have some implications for the asynchronous
> operations the library provides.
>
> - No two asynchronous MySql query reads can happen concurrently. While this
> still has value among other Asio operations, like a server that needs the
> DB eventually, the user needs to be careful about that. Maybe it would be
> safer if all MySql operations were on some special kind of strand. Or maybe
> the library could provide some "mysql::transaction_strand" functionality to
> help ensure this invariant for individual queries in the future.
> - A second implication is that some applications might find the
> asynchronous functionalities in Boost.Mysql not as useful as asynchronous
> functionalities in other protocols, like the ones in Boost.Beast. This
> depends on how their applications are structured. Since this is the main
> advantage over the C API, these users may question the value of the library
> and the documentation should discuss this more explicitly.
> - These implications could become irrelevant if the library provides some
> kind of functionality to enable a non-blocking mode. I have no idea how the
> MySql client achieves that.
>
> ## API
>
>> What is your evaluation of the design? Will the choice of API abstraction
> model ease the development of software that must talk to a MySQL database?
>
> I like how the API is very clean compared to the C API, even when including
> the asynchronous functionality. This would be a reason for using the
> library, even if I only used the synchronous functions.
>
> I'm worried about the lack of possibility of reusing memory for the
> results, as the interface depends on vector. This is not the usual Asio
> pattern. These vectors look even weirder in the asynchronous callbacks:
>
> - People have their containers/buffers and I would assume reading into some
> kind of existing row buffer would be the default interface, as is the case
> with other Asio read functions. In other words, read_many and read_all
> should work more like read_one.
> - Not returning these vectors is the common pattern in Asio: the initiating
> function receives a buffer for storage and the callback returns how many
> elements were read. Note that the buffer size already delimits how many
> elements we should read.
> - If we return vectors, async operations would need to instantiate the
> vector with the custom allocator for the operation. The callback wouldn't
> use std::vector<T> then. I would be std::vector<T,
> very_long_allocator_name> for every query callback, which is inconvenient.
> Using only std::vector<T> in the callback is not only inconvenient but
> dangerous because it would be the wrong allocator and everything would need
> to be copied.
> - One problem I see without the vectors is that each row would still need
> to allocate memory internally because of their dynamic size. I don't know
> the best solution for that second problem unless we have some sort of
> static size row/tuple, which does make sense since the user should know the
> number of columns at compile time.
>
> The library provides read_one, read_all, and read_many, but no "read_some".
> Besides the implication of a buffer to the initiating function, I would
> imagine reading as much as possible and parsing that to be the usual best
> strategy in this kind of an operation since we have a stream and we don't
> know how much data is available to read yet. Unfortunately, read_one,
> read_all, and read_many don't allow that. Unless read_many can already read
> less than the max number of rows in intermediary calls, which I don't think
> is the intention. It might also be useful to replicate the naming
> conventions Asio uses to read some or read all of the data.
>
> If "value" is intended to encapsulate the variant type,
> "value::to_variant" seems dangerous. The variant supports lots of types,
> which indicates the probability the internal design might need to change to
> fit new user demands is high. This could also need to change according to
> protocol extensions, and changing any of these types would break the API.
> If "value" already represents what it needs to represent,
> "value::to_variant" may not need to be exposed unless there's a lot of user
> demand for that.
>
> There is also an existential value/reference problem with "value":
>
> - The internal variant type includes a string_view whose string is owned by
> the row. This seems kind of dangerous or at least misleading. We have a
> reference type that's called "value", while this is more like a const
> "field view" or something.
> - At the same time, it's not a view either because I assume this is
> mutable, and updating numbers would not update the row.
> - The other values in the variant are value types, which makes "value" both
> a value and a reference type depending on the column type.
> - A reference type seems appropriate here because we don't want to copy all
> the data when iterating the results, but we need some other pattern for
> "value". A const view is especially useful because we might be able to keep
> value types for numbers, etc... Encapsulating the variant is also helpful
> here.
> - Calling a type "value" is a bad indication anyway. Any value is a "value".
>
> As others have mentioned, "value::get_std_optional" is unusual. It may not
> look problematic at first but it has a number of problems:
>
> 1) We have a dangerous precedent because we could also say we need to
> support all other boost/std equivalents in the library, such as the
> string_view the library uses at many places.
> 2) We are including and depending on two libraries when we know almost for
> sure the user is only using one of them: If there's no std::optional in the
> supported platform, we just need boost::optional. If you are requiring a
> C++ standard where std::optional is available, then we just need
> std::optional.
> 3) This is also not very useful if the user has both boost::optional and
> std::optional, because the code would still need to be refactored from
> get_optional to get_std_optional or vice-versa if the user decides to
> change it. This could be solved with some BOOST_MYSQL_USE_STD_OPTIONAL or
> BOOST_MYSQL_USE_STD_STRING_VIEW, etc., but this would also invalidate the
> distinction.
> 4) This would make the API inconsistent in case Boost moves to a C++17
> minimum at some point in the future, replacing boost classes, as we have
> been discussing C++14 lately. Both get_optional and get_std_optional would
> return the same type. Also, if std::optional is the *standard* type, it
> should not be the one whose function needs clarification with a longer name
> "get_std_optional".
> 5) This doesn't need to be a member function, which indicates it shouldn't
> be a member function in this case. The class is internally representing
> only one of these and conversions will happen anyway. A user in the >C++17
> intersection can easily implement their own to_std_optional free function
> outside the class.
>
> If I understood the documentation correctly, it's also problematic to use
> an empty optional for both a null value and an invalid result. They are
> simply not the same thing. Libraries always differentiate an invalid state
> from null in a field. This is especially relevant because the documentation
> warns us against using the "is_convertible_to" pattern over "get_optional"
> because "is_convertible_to" is unnecessarily slow. By the way, it would be
> nice to explain why "// WARNING!! Inefficient, do NOT do" is true. I
> wouldn't expect that to cost more than creating the optional value.
>
> The documentation states that the behavior of relational operators may
> change in the future. What do other libraries do about these comparisons?
> We probably don't want something like PHP, where "20" == 20 so that we need
> some kind of "20" === 20 in the future, but the documentation implies we
> also don't want "value(std::int64_t(200)) > value(std::uint64_t(200))" to
> fail. The library makes the whole thing particularly tricky with number
> types because not checking the proper number type in the variant would
> return invalid/null even when the number could be stored in a variable of
> another number type perfectly fine, which is even worse when both invalid
> and null are presented by the same empty optional.
>
> About the "resultset":
>
> - If "resultset" is a table, the "set" part of the name might be a little
> misleading in C++ as the results are in a deterministic sequence, even
> though this is what MySQL may call a similar concept. However, I imagine
> there's no equivalent concept in MySql for this specific
> Asio-async-lazy-fetching functionality itself.
> - This class is more like a "table", "frame", "vector", "list", or just
> "result". However, after all the steps of "connect", "prepare", and
> "execute", the name "result" gives the impression that the results are
> finally there, but this is just another stream object like a "result
> fetcher" or something.
> - Since the protocol is strictly sequential, it might be useful to not even
> have a resultset class at all. The connection could send the query and have
> some read function, like the usual read/write functions in Asio can operate
> on the same stream.
>
> Minor issues:
>
> - Maybe it's not clear from the examples, but why would the user need
> connection::handshake when connection::connection already does it?
> - Shouldn't `tcp_ssl_connection::connect` accept an EndpointSequence to be
> more like other Asio functions.
> - The names "tcp_ssl_prepared_statement", "tcp_ssl_<component>" are quite
> repetitive and verbose in the documentation. Maybe "tcp_ssl" could become a
> namespace? Isn't that what Asio does for the ssl related objects? Some
> names might be reduced if there's no alternative. For instance,
> "prepared_statement" might become "statement" since there's no
> "unprepared_statement".
> - Wouldn't some form of `statement::execute` with variadic args (different
> name perhaps) be a simpler alternative to
> `statement::execute(boost::mysql::make\_values(...))`. Or maybe
> statement::operator(), as Phil suggested. Or some
> `statement::execute_with(...)`. In the variadic args, the optional err/info
> can go into some specified positions, since they're not ambiguous with the
> field variant types. This is a little tricky but definitely useful,
> especially if suggestions of customization points are implemented in the
> future.
>
> ## Implementation
>
>> What is your evaluation of the implementation?
> I haven't analyzed the implementation very profoundly. I skimmed through
> the source code and couldn't find anything problematic. It would be useful
> if the experts could inspect the Asio composed ops more deeply.
>
> CMakeLists.txt:
>
> - I believe the CMakeLists.txt script is not in the format of other boost
> libraries in boost/libs so it won't work with the super-project as it is.
> - The example/CMakeLists.txt script refers to
> BOOST_MYSQL_INTEGRATION_TESTS. I don't think examples can be considered
> integration tests.
>
> Examples:
>
> - The examples are very nice. Especially the one with coroutines. They are
> also very limited. They are all about the same text queries, which
> shouldn't even be used in favor of prepared statements.
> - Many examples about continuation styles are not very useful because this
> is more of an Asio feature than a library feature. The library feature, so
> to speak, is supporting Asio tokens properly. The continuation styles could
> be exemplified in the exposition with some small snippets for users not
> used to Asio without the documentation losing any value.
> - Some examples are simple enough and don't require the reader to know the
> rest of the exposition. They are like a quick look into the library. These
> could come at the beginning, as in the Asio tutorials and Beast quick look
> section.
> - The first sync example could be simpler to involve just a hello world
> before moving on to other operations.
> - The page about the docker container should specify that the username and
> password are "root" and "".
>
> Tests:
>
> Some unit tests take a ***very*** long time. Enough to make coffee and a
> sandwich. And they seem to not be adding a lot of value in terms of
> coverage. For instance, "mysql/test/unit/detail/protocol/date.cpp(72):
> info: check '1974- 1-30' has passed" going through all possible dates
> multiple times took a long time.
>
>> Did you try to use the library? With what compiler? Did you have any
> problems?
>
> No problems at all. GCC 11 and MSVC 19.
>
> ## Documentation
>
>> What is your evaluation of the documentation?
> The documentation is complete. The main points that differentiate the
> library are
>
> - it's a complete rewrite of the protocol,
> - it's low-level and
> - it's based on Boost.Asio
>
> The documentation should emphasize these points as much as possible,
> especially the first one. This should be in the introduction, the
> motivation, slogans, logos, and wherever people can see it easily.
>
> The documentation should also provide arguments and evidence that
> these design goals are a good idea, as often discussed when the topic is
> the value of this library. Why is it worth rewriting the protocol? To what
> use cases are such a low-level library useful? Why should a person who
> already uses other libraries or the C API care about Asio now? Something
> that should also be highlighted is the difference between the library and
> other higher-level libraries, in particular, naming names.
>
> Minor issues:
>
> - There's no link in the documentation to the protocol specification. It
> would be interesting to know what the reference specification is. Or
> whether the protocol was inferred somehow. Is there any chance this
> protocol might change? What about divergences between MySql and MariaDB?
> How stable is the protocol? For what range of versions does it work? What's
> the policy when it changes?
> - Some links are broken (for instance, linking to
> https://anarthal.github.io/boost-mysql/index.html).
> - "All async operations in this library support per-operation
> cancellation". It's important to highlight this is per operation in the
> Asio sense of an operation but not in the MySql sense of an operation
> because the MySql connection is invalid after that.
> - "Boost.MySql has been tested with the following versions of MySQL".
> MariaDB is not a version of MySql.
> - Prepared statements should come first in the examples, to highlight them
> as the default pattern.
> - The documentation refers to topics that haven't been explained yet. Maybe
> "value" could be explained after "row", and "row" could be explained after
> "resultset" and "resultset" after "queries".
> - The section "Text Queries" is quite small in comparison to other
> sections. It could include some examples and snippets like other sections
> do.
> - "The following completion tokens can be used in any asyncrhonous
> operation within Boost.Mysql" -> "Any completion token..."
> - "When they fail, they throw a boost::system::system_error exception".
> Don't these functions just set the proper error_code, as usual with Asio
> and Beast?
> - The "MySQL to C++ mapping reference" section should be using a table.
> - A small subsection on transactions would be helpful even if there's no
> library functionality to help with that.
> - The documentation should include some comparisons that are not obvious to
> potential users. C/C++ APIs. The advantages of the Asio async model.
> Benchmarks if possible.
>
> ## Conclusion
>
>> How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?
>
> I spent one day in this review. I read all the documentation, ran the
> tests, experimented with the examples, and had a reasonable look at the
> implementation.
>
>> Are you knowledgeable about the problem domain?
> I'm reasonably educated about databases but not an expert. I've been
> working a lot with Asio.
>
>> Are there any immediate improvements that could be made after acceptance,
> if acceptance should happen?
>
> While it's important to have a general variant type for row values, a
> simpler interface for tuples of custom types would be very welcome and
> would simplify things by a lot, while also avoiding allocations, since
> columns always have the same types. This feature is too obvious since users
> almost always know their column types at compile time and this demand is
> too recurrent in applications to ignore.
>
>> Do you think the library should be accepted as a Boost library? Be sure
> to say this explicitly so that your other comments don't obscure your
> overall opinion.
>
> I believe it should be conditionally accepted with the same conditions
> stated in other reviews: allowing for memory reuse in the read_* functions
> and fixing the "value" type.
>
> Best,
>
> Em ter., 10 de mai. de 2022 ?s 04:14, Richard Hodges via Boost <
> boost_at_[hidden]> escreveu:
>
>> Dear All,
>>
>> The Boost formal review of the MySQL library starts Today, taking place
>> from May 10th, 2022 to May 19th, 2022 (inclusive) - We are starting one day
>> after the announced date and extending the period by one day to compensate.
>>
>> The library is authored by Rub?n P?rez Hidalgo (@anarthal in the CppLang
>> slack).
>>
>> Documentation: https://anarthal.github.io/mysql/index.html
>> Source: https://github.com/anarthal/mysql/
>>
>> The library is built on the bedrock of Boost.Asio and provides both
>> synchronous and asynchronous client connectors for the MySQL database
>> system.
>>
>> Boost.MySQL is written from the ground up, implementing the entire protocol
>> with no external dependencies beyond the Boost library.
>> It is compatible with MariaDB.
>>
>> Connectivity options include TCP, SSL and Unix Sockets.
>>
>> For async interfaces, examples in the documentation demonstrate full
>> compatibility with all Asio completion handler styles, including:
>>
>> Callbacks:-
>> https://anarthal.github.io/mysql/mysql/examples/query_async_callbacks.html
>>
>> Futures :-
>> https://anarthal.github.io/mysql/mysql/examples/query_async_futures.html
>>
>> Boost.Coroutine :-
>> https://anarthal.github.io/mysql/mysql/examples/query_async_coroutines.html
>>
>> C++20 Coroutines :-
>>
>> https://anarthal.github.io/mysql/mysql/examples/query_async_coroutinescpp20.html
>>
>> Rub?n has also implemented the Asio protocols for deducing default
>> completion token types :-
>>
>> https://anarthal.github.io/mysql/mysql/examples/default_completion_tokens.html
>>
>> Reviewing a database connector in depth will require setting up an instance
>> of a MySQL database. Fortunately most (all?) Linux distributions carry a
>> MySQL and/or MariaDB package. MySQL community edition is available for
>> download on all platforms here:
>> https://dev.mysql.com/downloads/
>>
>> Rub?n has spent quite some time in order to bring us this library
>> candidate. The development process has no doubt been a journey of discovery
>> into Asio, its concepts and inner workings. I am sure he has become a fount
>> of knowledge along the way.
>>
>> From a personal perspective, I was very happy to be asked to manage this
>> review. I hope it will be the first of many more reviews of libraries that
>> tackle business connectivity problems without further dependencies beyond
>> Boost, arguably one of the most trusted foundation libraries available.
>>
>> Please provide in your review information you think is valuable to
>> understand your choice to ACCEPT or REJECT including Describe as a
>> Boost library. Please be explicit about your decision (ACCEPT or REJECT).
>>
>> Some other questions you might want to consider answering:
>>
>> - Will the library bring additional out-of-the-box utility to Boost?
>> - What is your evaluation of the implementation?
>> - What is your evaluation of the documentation?
>> - Will the choice of API abstraction model ease the development of
>> software that must talk to a MySQL database?
>> - Are there any immediate improvements that could be made after
>> acceptance, if acceptance should happen?
>> - Did you try to use the library? With which compiler(s)? Did you
>> have any problems?
>> - How much effort did you put into your evaluation? A glance? A quick
>> reading? In-depth study?
>> - Are you knowledgeable about the problem domain?
>>
>> More information about the Boost Formal Review Process can be found
>> at: http://www.boost.org/community/reviews.html
>>
>> The review is open to anyone who is prepared to put in the work of
>> evaluating and reviewing the library. Prior experience in contributing to
>> Boost reviews is not a requirement.
>>
>> Thank you for your efforts in the Boost community. They are very much
>> appreciated.
>>
>> Richard Hodges
>> - review manager of the proposed Boost.MySQL library
>>
>> Rub?n is often available on CppLang Slack and of course by email should you
>> require any clarification not covered by the documentation, as am I.
>>
>> --
>> Richard Hodges
>> hodges.r_at_[hidden]
>> tg: @rhodges
>> office: +44 2032 898 513
>> mobile: +376 380 212
>>
>> _______________________________________________
>> Unsubscribe & other changes:
>> http://lists.boost.org/mailman/listinfo.cgi/boost
>>
>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk