Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2017-03-29 21:14:19
>> "Current file systems do not provide atomic multi-block appends;
>> appends can be broken down into multiple operations. However, most
>> file systems seemingly guarantee that some prefix of the data written
>> (e.g., the first 10 blocks of a larger append) will be appended
>> It sounds to me like I have this case covered with the "partial write"
>> failure mode of fail_file. Or is there another case I missed?
> This portion was worded poorly by the authors. If you look at table 1,
> a single block append doesn't work when the filesystem is **not** doing
> metadata journaling. Its inconceivable that multi-block appends would
> appear atomically for these configurations. Their intent was to point
> out that filesystem configurations achieving single block atomic append
> could actually do up to 10 blocks atomically.
I know from my empirical testing that atomic append only updates the
file's maximum extent atomically in the kernel. The i/o writing the
appended data is not atomic with respect to other appends nor reads, so
if three threads each append to the same file a block, the appends will
not be interleaved because the file high watermark is atomically
incremented by the total write size before each write is started, but if
you do a concurrent read of the end of the file you will see torn writes.
I don't know if the torn writes as seen by the kernel reflects what
reaches storage, but I'd suggest it would be wise to assume they land
into storage even more reordered again.
Now, journaled file systems may give the appearance of atomic appends
because I would assume that the journal has a mutex on it, and appending
unavoidably incurs extent allocation which is a journal operation. So it
could simply be that atomicity of appends is an artefact of how the
journal has been implemented for that file system, and is not a guarantee.
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk