|
Boost : |
Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2017-03-28 14:25:18
On 28/03/2017 14:42, Vinnie Falco via Boost wrote:
> On Tue, Mar 28, 2017 at 9:27 AM, Niall Douglas via Boost
> <boost_at_[hidden]> wrote:
>> I reckon in a few hundred lines of code I can meet or beat NuDB
>
> Is there a repository link where I can try out your code?
AFIO v2 is a collection of primitive filesystem algorithms. The idea is
that it's a toolkit for quickly mashing together custom file system
programs like your NuDB in minimum time and effort. So there is nothing
in there which looks like NuDB.
Docs, such as they are, are at https://ned14.github.io/boost.afio/.
Github repo is obvious.
And as I mentioned, the CI has been reporting it doesn't build for some
months now, so I'd doubt you can try anything until after the ACCU
conference.
If I were going to write in a few hundred lines something like NuDB, for
the value store file, I'd make it sparse and do atomic append as you do,
and use hole punching to delete and free storage. I'd create a shared
memory mapped view of the key file. I'd use something like
https://github.com/ned14/boost-lite/blob/master/include/algorithm/open_hash_index.hpp
for the indexing algorithm in that memory mapped key file, it has a
concurrency safe policy class called atomic_linear_memory_policy which
puts a spinlock on each hash table item, and so is concurrent safe.
That would be my first attempt, and it would be easy. But I think there
is a better algorithm in there, one based on a very large single sparse
file 16 Exabyte long which is itself is one giant open addressed hash
table. With a very high quality hash, you simply open hash address to
the nearest 4Kb block and your maximum key + value is 4Kb with larger
values being spilled into a separate largevalue file. Small value
cluster packing might be worth implementing as well if values are much
smaller than 4Kb, so the idea is you pack nearby items into the same 4Kb
extent where possible and use a linear scan to find them inside the 4Kb
block.
The second algorithm I think would be much faster than your algorithm,
but it's pure guesswork on my behalf on the basis that you do one i/o
per read or write as it's a single file. It easily might be much slower.
Either way, the first algorithm is basically the same as yours, but
simplified down to the bare essentials. It should perform as well as
yours, and likely better due to no write log file etc.
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk