Boost logo

Boost :

Subject: Re: [boost] [AFIO] Review (or lack of it)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-08-30 13:26:55

On 30 Aug 2015 at 10:19, Hartmut Kaiser wrote:

> > I did a bit of mocking up of what the refactored data_store interface
> > for that AFIO tutorial will look like:
> >
> >
> >
> > Ok, so it's no longer the world's simplest key-value store API and I
> > deliberately dropped the STL iostreams interop, but the big
> > difference is that this one is real-world ready and probably quite
> > close to a final API design for such a thing.
> *Looks* much better, you seem to listen after all. I wouldn't call that to
> be real-world-ready without a thorough specification of the semantics,
> though.

This is the first post you have written which contained anything
useful. You also wrote civilly in a productive tone. Thank you.

> - What happens if commit() was called on your transaction, the returned
> future is not ready yet, but the ~transaction() is called?

Nothing happens. commit() transfers the state held by the transaction
object into an asynchronous operation bound by the returned future.
The transaction object's destructor becomes do-nothing.

You are probably now about to ask if the returned future's destructor
is blocking. It is not. The transferred state is kept around by the
AFIO engine until success or failure and disposed of internally.

> - Why is your transaction object not RAII?

You may have a different definition to me. The idea is when you begin
a transaction, you pin a copy of the internally held key-value index
to that transaction. You then say what you are going to modify to the
transaction object via add(). If an exception throw occurs before the
commit(), the transaction destructor will throw away any
modifications you have added and it unpins the internally held
key-value index.

> - Why do you need begin_transaction?

It pins a copy of the internally held key-value index to the
transaction object. Later on, during commit, it should be able to
match off key lookups with key updates to decide if two concurrent
commits conflict or not.

This interface is purely for the AFIO tutorial only. Its concurrent
transaction handling policy is to always abort if any concurrent
update happened since the index used for the transaction was pinned.
This is deliberately simple for the tutorial.

> - Why do you need two load_blob() functions?

One loads a blob into a set of user supplied scatter buffers.

The other memory maps the blob and returns it as a set of potentially
discontinuous buffers. This saves you having to allocate your own

The reason there are two forms is that there is limited free address
space on 32 bit architectures.

> - Why is your buffertype so over-specified, shouldn't any range do?

std::vector is ABI stable and friendly to C and Microsoft COM

> - Why can't the user specify/provide the type of the hash? std::hash<>
> anybody?

For a content addressible data store, std::hash<> collides too

The choice between Spooky (non-crypto) and Blake2b (crypto) is all
anyone needs. There is a good argument for renaming them to "fast"
and "quality" respectively which I would do if this were not a
tutorial example.

> - Why does one of the load_blob returns a
> future<std::shared_ptr<buffers_type>>? Handle to handle to handle to data
> this time? Shared ownership again, really?

shared_ptr can opaquely call an internally supplied destruction
callback to de-memory map the blob. You could return some custom
smart pointer type which does the same. I felt it was much of a
muchness. I know you'll disagree with that decision, but it's a
tutorial, and an additional custom smart pointer type is extra screen

> - Can this blob-store be distributed over many machines?

It can be concurrently used from many machines, including via CIFS
network shares. It would be trivial to extend to be distributed
across multiple machines, but that is an implementation detail.

> Can it be in memory only (with transparent disk backup if needed)?

The kernel file page cache and extents based design makes it
effectively memory only with very lazy but write ordered flushing to
disk. If used with scatter gather buffers allocated via
afio::page_allocator, it also is zero memory copy on Windows and BSD.

> - Why do you specify undefined behavior into your API (see your union). Why
> do you need this in the first place?

The compiler will correctly align a __m128i on all the three major
compilers which allows for the hash implementations to be more
efficient. I left out the #ifdef SSE2 for brevity, but the final
edition would have it.

> - What's the concurrency/thread-safety guarantees of each of the functions?

Undecided. I currently expect thread safety for everything will
naturally emerge without even needing a mutex, but my local
implementation here isn't finished yet.

> etc. many more open questions remain. Please, don't get me wrong, I don't
> need/want answers to those questions right away, I'm just listing those for
> you to think about.
> I however suggest that you redesign AFIO first (including the proposed AFIO
> API), before you even start thinking about submitting this to Boost review.

This is purely for the front AFIO tutorial only and what I would call
an unguaranteed solution. It has an over engineered interface for a
tutorial, but I also want to demonstrate how AFIO's features deliver
the solution in just 1000 lines of code.

A production ready solution with well tested power loss behaviour I
doubt I could bring for Boost review until early 2017 unless someone
sponsors me to work on this full time, in which case Spring 2016
would a maximum. I have had enough of working 60-80 hour weeks which
has been the case most weeks since February and has contributed in
large part to my recent grumpiness.


ned Productions Limited Consulting

Boost list run by bdawes at, gregod at, cpdaniel at, john at