Boost logo

Boost :

Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2017-03-22 00:00:39


On Tue, Mar 21, 2017 at 7:36 PM, Niall Douglas via Boost
<boost_at_[hidden]> wrote:
> On 21/03/2017 20:36, Vinnie Falco via Boost wrote:
> I am unsure what benefit it
> confers over the much safer choices of SQLite or UnQLite both of which
> are battle tested and written by database experts to ensure that other
> people's data is never, ever lost.

SQLite is painfully slow compared to NuDB. UnQLite supports features
not present in NuDB such as cursor iteration, moving, and deleting.
The tradeoff for not supporting these features is that NuDB can make
better claims about its performance.

> I felt when I reviewed NuDB's implementation some time ago that your
> implementation would be highly susceptible to data loss in a wide set of
> circumstances.

NuDB makes some assumptions about the underlying file system. In
particular, that when a call to fsync() returns it is assured the data
is reliably written. If these invariants are met there is no data
loss.

In fact, one of NuDB's unit tests exhaustively verifies that the
database recovery process performs correctly after an I/O failure. The
unit test works by instantiating a database with a custom File
template parameter. This custom file implementation simulates an I/O
failure on the Nth operation. The test runs with N=1 to failure, then
increments N, and re-runs the test until there are no failures. At
each iteration, the test verifies that the database recover process
left the database in a consistent state. You can see this code here:
https://github.com/vinniefalco/NuDB/blob/master/test/recover.cpp#L125

I don't doubt that you think this database is vulnerable to data loss
under some condition. However, I would appreciate specifics so that I
can verify under what conditions your claim is correct, or if it is
correct at all.

> I also felt it wouldn't scale well to the > 1M IOPS
> storage just around the corner.

While it hasn't yet been tested on these non-existent devices, I am
optimistic, since NuDB was specifically designed for very intense
fetch workloads with light concurrent insertions. Fetches can happen
concurrently, it should be possible to saturate the read throughput of
a device. See:
https://github.com/vinniefalco/NuDB/blob/master/include/nudb/impl/basic_store.ipp#L213

> I would suspect it is being
> used in a highly restricted use case where corner cases don't present.

Maybe. I'll counter by saying, there are only three operations
possible on an open database: insert, fetch, and close.

> if you are up for getting this to a state ready to submit to Boost,
> I'd welcome it.

Cool! The docs just need a bit of work, the library code is more than ready.

Thanks for taking the time to go through it!


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk