Boost logo

Boost :

From: Maciej Sobczak (prog_at_[hidden])
Date: 2008-07-10 16:59:40


Ion Gaztan~aga wrote:

> 1) I think a solid ODBC backend is a must for a Boost DB library. After
> all, ODBC is the most widely used driver. Is SOCI ODBC compatibility
> mature or still needs serious work?

Some SOCI users enjoy the existing ODBC backend in production code. As
far as I understand, it does not require any major work and that's why
I'm pretty sure that the next library release will include the ODBC
backend as a supported part of the whole. Very likely the binary release
for Windows users (lib+dll) will be available as well.

> 2) Do we know how efficient SOCI is?

One of our users performed tests comparing SOCI to other libraries,
including the native client library for one database. The conclusion
from the test was that with optimized build (like -O2 for g++) the
performance of SOCI is comparable to other alternatives, with few
percent of overhead in relation to native client library, for *these* tests.

Note also that SOCI supports bulk operations in terms of vector<T>. The
Oracle backend is currently the only one that fully benefits from it by
performing data transmissions in blocks (it reduces network roundtrips),
binding to raw underlying memory buffer for direct data transport. The
scalability potential of this solution is huge, but the actual results
can be very sensitive to the details of the given test.

> I understand that database
> operations might be in most cases more expensive than SOCI operations

Yes, this is true. Of course, SOCI adds its own overhead in terms of
dynamic data structures that are created here and there, but the
important property of SOCI is that in its "canonical" idioms the library
binds directly to data provided by the user. For example:

int salary;
sql << "select salary from persons where id = 123", into(salary);

Above, the result of the query is put *directly* in the salary variable
if the given backend supports binary data transmission (Oracle, ODBC)
and there is only one conversion step if the backend operates on strings
internally (PostgreSQL, MySQL, ...). The same is true for explicit
statements.

In other words, the overhead of SOCI is a one-time investment for each
statement and is necessary to keep the statement in shape and to
associate data references with query results. After the statement is
created, all subsequent operations use existing structures for guidance
what to put where and if the statement is repeated many times - which is
where performance actually matters - the copy/conversion overhead is
minimal or just zero.

(Well, we had to add some additional data copying to ensure
const-correctness as perceived by the user code in *some* cases, but
this would need to be done with any other alternative approach as well.)

The only place where SOCI introduces some significant overhead is when
complicated user-defined datatypes are involved, since data needs to be
often copied and converted back and forth. This is the price for
flexibility. But hey, native client libraries and even ODBC have no idea
how to translate the query results into boost::fusion<std::string,
boost::optional<MyIntegerWithRange>, boost::gregorian::date>, right? ;-)

There are also other idioms supported by SOCI, like dynamic rowset
description or stream-like data extraction and they can (depending on
the backend) introduce additional overhead. Again, it is the price for
ultimate flexibility. For those who are performance-paranoid, the basic
statically bound idioms are recommended.

Having said all the above, it is important to note that with databases
the biggest opportunity for performance tuning is not in the client
interface, but in SQL queries and database structures themselves.
The rest is... Measure It Yourself. ;-)

-- 
Maciej Sobczak * www.msobczak.com * www.inspirel.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk