Boost logo

Boost :

Subject: Re: [boost] ACID transactions/flushing/MySQL InnoDB
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2009-12-10 16:38:54


On Fri, Dec 11, 2009 at 7:02 AM, Stefan Strasser <strasser_at_[hidden]> wrote:
> Am Thursday 10 December 2009 21:47:53 schrieb Dean Michael Berris:
>
>> When you say logged on commit, are you writing to a file that's
>> already created with enough space, or are you writing to a file which
>> is "grown" every time you add data to it?
>
> I had tried that. and I've tried writing a sector instead of 1 byte. and I've
> tried removing O_CREAT.
> but you have to actually do ALL THREE, and one more: the sector writes need to
> be sector-aligned.
> so, writing 512 bytes, aligned to 512 bytes, without O_CREAT, when the file
> already exists, brings the desired results
> 2 seconds with much less disk usage.
> that's some set of conditions.
>
> thanks for helping with this.
>

Nice. :) You're welcome.

>> What InnoDB does is puts everything that fits in memory up there, and
>> keeps it there. Small transactions will write to the log file, but not
>> write all the data directly to disk right away.
>
> that's almost equal to my approach. I do write to the data files, but only
> sync them when a large transaction is committed or the log is rolled to a new
> one.
> I should probably think about sector-aligning those data writes, too, given
> the new insights.
>

Sounds like a good approach. If you're thinking of multi-threading and
having an active object do the flush management, that should be
something worth looking into to move the latency from persistence away
from "worker" threads to a single serializing writer thread.

>> Because you're using fsync, you're asking the kernel to do it for you
>> -- and if your file is already in the vfs cache, the chances of fsync
>> returning quicker is higher due to write caching at the OS level.
>
> I don't think the OS uses write caching in the case of fsync. it isn't
> supposed to, is it?
>

It actually has license to "cache" in the sense that it queues the
data to be written on a per-fd basis. Even if you're not doing
buffered write, that doesn't mean the OS will actually honor a call to
fsync that returns right away to mean the data has already been
written to disk. IIRC, the POSIX standard doesn't really say that
after an fsync the data is guaranteed to have been written to disk --
only that the state of the file descriptor that the kernel holds and
the userspace descriptor are synchronized; this can mean a lot of
things and it doesn't guarantee that it's already written to disk.

I may be wrong though but that is how I understand it.

HTH

-- 
Dean Michael Berris
blog.cplusplus-soup.com | twitter.com/mikhailberis
linkedin.com/in/mikhailberis | facebook.com/dean.berris | deanberris.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk