Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2017-03-26 23:14:50
>> fsync() performs pathologically awful on copy-on-write filing systems
> The library is not designed for exotic file systems like the one you
> describe. Its meant for simple commodity hardware and operating
> systems such as what you might find on a bare metal amazon web
> instance. There is no need for a copy on write file system, as long as
> the invariants are met (that fsyncs aren't reordered).
Except, as has already been established, retrievability of fsyncs after
power loss *are* reordered. So your invariant is not met.
For the record, ZFS is hardly an exotic file system. All my servers are
running ZFS on Linux because that Linux distro (Proxmox) defaults to
ZFS. My FreeBSD install on my laptop runs ZFS because that's also the
default filing system for FreeBSD.
Additionally, ext4 can be mounted in COW mode via "data=journal". So
none of this is exotic, merely it's commonplace outside where you've
been using NuDB to date. As I said to you at the very beginning of all
this, your database is aimed at a very limited use case. If it entered
Boost, you'd find people doing all sorts of crazy stuff with it, and
running it on ZFS would be very mild compared to what some would do.
>> In which case you did not make a great choice.
>> Much, much better would be Blake2b. 2 cycles/byte, cryptographically
>> secure, collision probability exceeds life of the universe.
> Hmm, no, it seems that you are the one who "did not make a great
> choice." The requirements of Hasher do not include cryptographic
> security. Blake2b is a cryptographically secure hash function which
> computes digests up to 32 bytes, while xxhasher is a non
> cryptographically secure hash function which computes a 64-bit digest.
> NuDB requires a Hasher more like std::hash and less like SHA-1.
I already explained in my reply to Lee why one very good approach to
portably achieving durability is to use an acyclic graph of chained
cryptographic hashes to maintain a secure history of time. Exactly as
git or mercurial does in fact. If you're not using cryptographically
strong hashing, it's highly unlikely your database can be durable.
> Blake2b can achieve almost 1Gb/s while xxhash can achieve 110Gb/s.
Your maths are seriously out.
Your hash function (which runs per thread) only needs to be as fast as
your storage device is at a queue depth of 1. So, taking a top of the
range NVM SSD, the Samsung 960 Pro, it can write at QD1 about 50k IOPS.
That's around 200Mb/sec/thread. Blake2b runs at 1Gb/sec, so it should
comfortably fit with a bit of room to spare on a single thread.
Obviously, more threads gets you more queue depth and performance should
rise linearly until you run out of CPUs.
Note I mention you only cryptographically hash on write. I wouldn't
suggest it for lookups and reads except on first database open. For
lookups and reads I'd strongly recommend SpookyHash v2 over xxhash
(you'll find a header only edition of SpookyHash v2 in AFIO v2).
Spooky was designed by a renowned crypto hash expert, unlike the
MurmorHash derived xxhash which definitely was not. Spooky is fast on
all architectures, not just Intel x64. Spooky uses the same internal
mechanism as a cryptographically strong hash, just fewer rounds for
performance. Spooky is as ddos resistant as siphash, and empirically
proved collision resistant to 2^72 inputs, whereas xxhash will collide
at best at 2^32 inputs. And the 128 bit hash Spooky creates fits
perfectly into a single SSE or NEON register making working with them
single cycle, and that is exactly what AFIO uses to work with them via
its uint128 type.
So for a content addressable database like yours, please use SpookyHash
v2, even if you XOR into 64 bits. And if you decide to stick with 64 bit
hashes, you need to document that collision is mathematically certain
after 4 billion items have been inserted, and mathematically likely long
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk