Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Gavin Lambert (gavinl_at_[hidden])
Date: 2017-03-22 22:49:20
On 23/03/2017 00:46, Niall Douglas via Boost wrote:
> On 22/03/2017 10:50, Olaf van der Spek wrote:
>> Isn't fsync() supposed to ensure data is on durable storage before it returns?
> So, without _POSIX_SYNCHRONIZED_IO, all fsync() guarantees is that it
> will not return until the *request* for the transfer of outstanding data
> to storage has completed. In other words, it pokes the OS to start
> flushing data now rather than later, and returns immediately. OS X
> implements this sort of fsync() for example.
> With _POSIX_SYNCHRONIZED_IO, you get stronger guarantees that upon
> return from the syscall, "synchronized I/O file integrity completion"
> has occurred. Linux infamously claims _POSIX_SYNCHRONIZED_IO, yet
> ext2/ext3/ext4 don't implement it fully and will happily reorder fsyncs
> of the metadata needed to later retrieve a fsynced write of data. So the
> data itself is written on fsync return sequentially consistent, but not
> the metadata to later retrieve it, that can be reordered with respect to
> other fsyncs.
There's other factors, too; even if the OS has fully flushed the data to
the storage device, the storage device itself might be holding some of
it in an internal volatile buffer and may possibly even perform actual
non-volatile writes out of order.
(Disclaimer: I haven't looked at the internals of storage hardware for a
long time so perhaps they make better guarantees than I think.)