Boost logo

Boost :

Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2017-03-29 17:06:02


On 29/03/2017 17:32, Lee Clagett via Boost wrote:
> Read this [paper on crash-consistent applications][0]. Table 1 on
> page 5

I particularly like the sentence:

"However, not issuing such an fsync() is perhaps more safe in modern
file systems than out-of-order persistence of directory
operations. We believe the developers’ interest in fixing
this problem arises from the Linux documentation explicitly
recommending an fsync() after creating a file."

I agree with them. fsync() gives false assurance. Better to not use it,
and certainly never rely on it.

> should be of particular interest. I _think_ the bucket portion of
> NuDB's log has no size constraint, so its algorithm is either going
> to be "single sector append", "single block append", or "multi-block
> append/writes" depending on the total size of the buckets. The
> algorithm is always problematic when metadata journaling is disabled.
> Your assumptions of fsync have not been violated to achieve those
> inconsistencies.

One of my biggest issues with NuDB is the log file. Specifically, it's
worse than useless, it actively interferes with database integrity.

If you implemented NuDB as a simple data file and a memory mapped key
file and always atomic appended transactions to the data file when
inserting items, then after power loss you could check if the key file
mentions extents not possible given the size of the data file. You then
can rebuild the key file simply by replaying through the data file,
being careful to ignore any truncated final append.

That would be a reasonable power loss recovery algorithm. A little slow
to do recovery for large databases, but safe, reliable, predictable and
it would only run on a badly closed database. You can also turn off
fsync entirely, and let the atomic appends land on storage in an order
probably close to the append order. Ought to be quicker than NuDB by a
fair bit, much fewer i/o ops, simpler design.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk