Boost logo

Boost :

Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Lee Clagett (forum_at_[hidden])
Date: 2017-03-29 00:26:42


On Tue, 28 Mar 2017 08:59:10 -0400
Vinnie Falco via Boost <boost_at_[hidden]> wrote:
> On Tue, Mar 28, 2017 at 8:45 AM, Lee Clagett via Boost
> <boost_at_[hidden]> wrote:
> > ...writing the log header after its contents
> > could reduce the probability of an undetected incomplete write
>
> The recovery test simulates partial writes:
> https://github.com/vinniefalco/NuDB/blob/master/extras/nudb/test/fail_file.hpp#L292

This is simulating a write I/O error, not a power failure. Even with
the assumption that a returned `fsync` has fully stored the data on
disk, the recovery algorithm could be opening a log file which called
`write` but never returned from `fsync`. That file could have enough
"space" for a bucket, but lack the proper contents of the bucket
itself. There is an inherent race between writing and the completion of
an `fsync` that will go unnoticed by the current recovery algorithm on
some filesystem configurations. The only "portable" fixes I've seen
are: (1) cryptographic hashes, (2) hoping that changing path to inode
mappings is all-or-nothing, or (3) hoping that _overwriting_ a single
sector last will be all-or-nothing. Both (2) and (3) still depend on
the filesystem + hardware AFAIK, BUT probably work with more filesystem
and hardware configurations.

> > NuDB already has a file concept that needs documenting and
> > formalizing before any potential boost review.
>
> http://vinniefalco.github.io/nudb/nudb/types/File.html
>

I did miss this. Defining the concept in terms of "records" might be
more useful for storing a cryptographic hash in the manner Niall was
mentioning for SQLite. I think it could allow per record corruption
detection, so that the entire DB wasn't punted after an incomplete
write.

Lee


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk