Subject: Re: [boost] ACID transactions/flushing/MySQL InnoDB
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2009-12-10 10:40:12
On Fri, Dec 11, 2009 at 1:07 AM, Stefan Strasser <strasser_at_[hidden]> wrote:
> this is not a boost question, but it is for the development of a boost
> library, so maybe a database expert on the list could help me out?
I'm not a database expert but I have dealt with databases for the
better part of my development career so let me share a little bit of
the insights I've learned with MySQL and InnoDB.
> I've been benchmarking some test cases of my library against MySQL InnoDB,
> which isn't even apples to oranges anymore, but nevertheless gave some
> what I find surprising is the performance of MySQL for small transactions.
> while my library spends 90% of its time waiting for the disk to flush a
> transaction, InnoDB uses 100% CPU and barely uses any disk.
Am I reading this correctly, you're doing something with persistence
of tuples to disk, allowing indexed access to this persisted data, and
then allowing someone to embed it in their application. This sounds a
lot like an embedded database system and looking at SQLite and BDB
might be better than comparing with InnoDB. But then that's just my
take on it. ;)
> I've reduced the calls to fsync() for small transactions to 1 call per
> transaction after some sequential writing, so I'm wondering how MySQL
> accomplishes this?
> by definition there must be a flush per transaction, unless you loosen the
> requirements of the D in ACID.
> Is that what InnoDB is doing?
> the only way I see to reduce flushes even more is combining some transactions.
> but after a power failure some of those transactions can be lost even though
> they haven reported as successfully committed.
This depends completely on how you've configured InnoDB. There's a
configuration setting that allows you to commit every after
transaction, a setting that allows you to flush every second instead
of after every commit, and a setting that allows you to flush the log
file entry every after commit but the data to the disk.
The way InnoDB does it is it keeps the index and the data in the
memory pool -- the CPU usage is probably due to cache misses when
you're doing a full-table scan. When you commit a transaction, the
B-Tree index in memory is updated and a log entry is added to the log.
When the flush happens is completely configurable -- so right after
the B-Tree index gets updated, you can have it flushed to disk, or
after a timeout, or have only the log committed and have the
index+data flushed later on.
Another means that they do away with the flush is by using memory
mapped regions in Linux/UNIX, and then delegate the flushing to the
operating system. InnoDB also supports raw storage management in which
case it bypasses the normal filesystem flushing and deals with raw
storage devices as block devices on its own, using more low-level
access to the device.
> I have not yet examined their source code yet. the MySQL documentation doesn't
> say anything about loosening Durability. (although you have to look pretty
> hard to see that the default isolation level isn't Serializable either.)
There's a section about InnoDB log file flushing. I would have
included a link but I think the information above should already be
helpful to you.
> thanks in advance,
BTW, I am interested in what you're doing and maybe using a library
that tackles the embedded database space in a modern C++ manner.
Looking forward to hearing more about what you're working on.
-- Dean Michael Berris blog.cplusplus-soup.com | twitter.com/mikhailberis linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com