Boost logo

Boost :

From: Ruben Perez (rubenperez038_at_[hidden])
Date: 2020-04-14 18:41:56


On Tue, 14 Apr 2020 at 14:16, Dominique Devienne via Boost <
boost_at_[hidden]> wrote:

> On Mon, Apr 13, 2020 at 1:52 PM Ruben Perez via Boost <
> boost_at_[hidden]>
> wrote:
>
> > I have been working in a proof-of-concept implementation of PostgreSQL.
> You
> > can find it here: https://github.com/anarthal/postgres-asio
>
>
> Hi. Great, I've been following this thread from afar, because first I'm
> using/learning ASIO
> myself, and second because I call databases as part of the processing I do
> in my ASIO-scheduled
> "tasks". But I'm a PostgreSQL user, not a MySQL one, thus my relative lack
> of interest until now :).
>

>
> > It supports connections, queries and prepared statements with an API
> almost
> > identical to the one proposed for Boost.MySQL. I have only implemented
> > synchronous operations yet. I hope this helps prove that implementing
> other
> > databases with a very similar API is feasible and helps push the MySQL
> > library forward.
> >
>
> I'm confused. What's the point of implementing synchronous at all rather
> than use
> libpq directly? libpq is "official" and synchronous, so the *only* reason
> I'd prefer your
> re-implementation of the protocol would be to gain async via ASIO.
>

Thanks for checking out! The point of that library was just a small proof
of concept. There were a considerable amount of people concerned about the
MySQL library being too MySQL specific. I just wanted to show that you
*could* write a similar kind of library for other database. So at this
moment you are right, there is absolutely no reason to choose my
implementation. Just to be clear, the MySQL implementation is indeed
complete and tested.

>
> I haven't looked at the code, but I would also have assumed that any
> synchronuous
> API for a library that stated to the async, would be as a convenience layer
> on top of
> the async layer, doing the waiting/synchronization for you, to simplify
> things when async
> is not strictly necessary.
>
> But what you wrote above seems to imply the contrary. Is the async API
> implemented
> on top of the sync API? Or you implement two separate APIs completely?
>

There is no async API at the moment. In the MySQL library both APIs are
implemented separately, with most of the code factored out into
functions/classes. But once you've got the synchronous implementation and
API, you can "just write" the async one, as I keep the sync and async API
consistent. So, for the proof of concept, I just did the sync one (which is
easier).

>
>
> > Note that the Postgres library is just a proof of concept; it is very far
> > from being a full implementation. At this point I lack the time to
> > transform this library into a full implementation. If somebody wants to
> > volunteer and progress this library into a full implementation (or just
> > write something from scratch), I would appreciate it a lot - just message
> > me.
> >
>
> It's hard to get traction on alternatives to official APIs (like libpq for
> PostgreSQL),
> w/o some pretty strong guarantees on testing. And testing with databases is
> complicated
> because of the need to access that remote DB; you can't easily mock a
> database reliably.
> I'm not sure how that would work within Boost. Are there precedent for
> Boost libraries that
> need "external" resources like full blown databases? Perhaps via VMs and/or
> Docker?
>

For MySQL, I have set up CI testing (Travis for Linux and OSX, Appveyor for
Windows). I deploy the database (at this moment using the system package
manager, I am considering switching to Docker containers to test a wider
range of DB versions). Most of the tests run with a real database
connection. At this moment, The MySQL library database tests cover most of
the native types, both in text queries and statements, as well as the most
used authentication methods.

If I were to do a fully-fledged implementation for Postgres, I would
implement a similar strategy. But doing so for MySQL has proven to be
really time-consuming. I would prefer to push the MySQL library forward
before moving into another one.

>
>
> > As always, any feedback is welcome.
> >
>
> A few things that came to my mind reading this thread, from my DB
> experience, is that
> the re-implement-the-low-level-protocol approach applies to very few DB
> systems,
> given that most commercial DBs use proprietary protocols (I know Oracle
> does for sure).
> Or the most used DB in the world (SQLite) doesn't have a network protocol
> at all. So that
> pretty much only leaves MySQL and PostgreSQL, no? So at the very least,
> both should be
> supported by a library that wants to gain widespread appeal IMHO. And from
> my POV, you
> chose the wrong one to start with :)
>

I have read that SQLite has an async API (using the filesystem asynchronous
API), so maybe we could do something with it. I haven't explored that at
all, TBH.

>
> Another remark is that PostgreSQL has two modes, text (the default) and
> binary to send row
> data back and forth. We've tested both and went the most difficult route of
> using the binary
> route for performance reasons, which complicates things. Do you handle both
> modes at the
> protocol level?
>

No, the proof of concept just does text.

>
> I'd conclude by saying that from my perspective, the one-true-SQL-library
> is a pipe dream
> when you want to maximum performance, or when wanting to avoid the
> lowest-common
> denominator approach.
>

> I've used Array-binding in Oracle, and XMLTYPE, and low-level C/C++ binding
> of Spatial objects
> in OCI, and pretty much no SQL abstraction libraries can do these things.
> After doing lots of
> Oracle and SQLite, and now doing PostgreSQL, the differences are actually
> very large between
> these DBs, to the point a lots of the optimization done for one are not
> applicable to the other,
> and many things just don't translate at all. If one stays within a *tiny*
> subset of SQL and features,
> maybe... But any non-trivial sophisticated DB use, you quickly find the
> limits libraries that try
> to be multi-DB IMHO.
>

I completely agree with this. My view on this is that there should be a
database for each library, so that you can expose the details specific to
each database. (E.g. Portals, the binary protocol and execution row count
for Postgres; Portals are not even a thing in MySQL).

Then, if you want a multi-database library, you use the database-specific
building blocks. However, there were people arguing that these building
blocks may not be valuable enough to be part of Boost.

>
> So in your case, what matters most is the async ASIO-based design, and
> having a framework
> to develop async DB client libraries, but not necessarily exposing exactly
> the same API for all
> of them. What matters is to be able to start processing rows and sending
> them downstream for
> further processing, *BEFORE* the whole result set made it client-side.
>

Interesting feature, we could do that in MySQL, too.

> Maybe what I'm saying
> doesn't make sense, because I haven't studied your code and examples yet,
> but FWIW,
> these are the thoughts I had following this thread.
>
> Thanks, --DD
>

All you said makes sense, thanks. Glad to know there would be interest for
a Postgres library, too.

>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk