Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2017-03-29 21:04:06
On 29/03/2017 19:54, AsbjÃ¸rn via Boost wrote:
> On 29.03.2017 13:07, Niall Douglas via Boost wrote:
>> On 29/03/2017 10:13, AsbjÃ¸rn via Boost wrote:
>>> On 29.03.2017 08:18, Niall Douglas via Boost wrote:
>>>> Whatever is lost is lost, the *key* feature is that
>>>> damaged data doesn't cause further data loss.
>>> I'm struggling to see how you can guarantee that without _any_
>>> guarantees from the OS or hardware.
>> The lack of guarantees only refers to post-power-loss data integrity.
> But that's not what you wrote. You said:
> "The point I am trying to make is that NuDB's guarantees need to NOT
> depend on the OS, filesystem and hardware. Otherwise they are not
> valuable guarantees."
I thought I was clear, but let me rephrase so:
"The point I am trying to make is that NuDB's [post power loss data
integrity] guarantees need to NOT depend on the OS, filesystem and
hardware. Otherwise they are not valuable guarantees."
> Surely any post-power-loss integrity guarantees are intimately related
> to between-power-loss guarantees as the data is being written in the
> "between" state right until the power goes.
> As an extreme example, if the OS does not guarantee your data will be
> written unmodified in the "between-power-loss" state, that is, it may
> write random data instead, then that directly affects the
> post-power-loss integrity. How could NuDB code around this?
> Surely at some point a program/library like NuDB must rely on
> _something_ from the OS, filesystem and hardware in order to claim
> anything about post-power-loss integrity of its data?
You are not wrong that memory corruption will inevitably affect what
lands on storage. There is even a chance that the storage is fine, but a
read got corrupted going into memory.
There are nine sigma reliability CPUs that keep two parity bits per byte
to solve the trustworthiness problem, but let's assume we're talking
consumer hardware. When I say that you can assume that in between sudden
power loss events everything works, but not across power loss events, I
am talking about probabilities of data loss.
So, between sudden power less events the chances of bits getting flipped
somewhere important is very low. That's why your laptop, which may be
turned on for weeks, doesn't blue screen usually even though quite a few
bits will have been flipped by cosmic rays. The chances are also pretty
good nowadays that your OS has been well tested, as has been your
filesystem and your SSD, at least well tested in the code paths
regularly executed which is 99.99% of what you will ever do. So given
that almost all of the software on your computer isn't constantly
crashing and file systems are not always corrupting, I'd say the chances
of data loss between sudden power loss events is very low, perhaps no
more frequently than once per month.
Your hardware and OS and filesystem and SSD probably have been well
tested *individually* for sudden power loss under certain canned test
scenarios. The chances of them being tested together as a system is
extremely low - perhaps only Apple do so with their MacBooks. For data
loss to occur, just one thing needs to go wrong, whereas for data loss
to not occur every single thing needs to go right.
It's hard to come up with concrete probabilities, but as that paper Lee
linked to points out, everybody writing storage algorithms - even the
professionals - consistently gets sudden power loss wrong without fail.
That paper found power loss bugs in the OS, filing systems, all the
major databases and source control implementations and so on. This is
despite all of those being written and tested very carefully for power
loss correctness. They ALL made mistakes.
So I'm going to throw out there that after power loss there is a strong
probability that storage has got screwed up somehow by someone in the
chain of bits of software between your code and the bits on the medium.
That's why you should be paranoid about correct state after power loss,
but between power loss events you can probably safely take off the
Does this make more sense?
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk