Boost logo

Boost :

Subject: Re: [boost] [AFIO] Review (or lack of it)
From: Hartmut Kaiser (hartmut.kaiser_at_[hidden])
Date: 2015-08-30 16:32:24


> > > I did a bit of mocking up of what the refactored data_store interface
> > > for that AFIO tutorial will look like:
> > >
> > > https://gist.github.com/ned14/163ac57c937bda61e5c9
> > >
> > > Ok, so it's no longer the world's simplest key-value store API and I
> > > deliberately dropped the STL iostreams interop, but the big
> > > difference is that this one is real-world ready and probably quite
> > > close to a final API design for such a thing.
> >
> > *Looks* much better, you seem to listen after all. I wouldn't call that
> to
> > be real-world-ready without a thorough specification of the semantics,
> > though.
>
> This is the first post you have written which contained anything
> useful. You also wrote civilly in a productive tone. Thank you.

Most welcome, however note: this is just my way of (to quote you): "... draw
a series of conclusions from a set of incorrect assumptions, as per (my)
usual style of trying to instil uncertainty and doubt"

(I left the spelling unchanged).

> > - What happens if commit() was called on your transaction, the returned
> > future is not ready yet, but the ~transaction() is called?
>
> Nothing happens. commit() transfers the state held by the transaction
> object into an asynchronous operation bound by the returned future.
> The transaction object's destructor becomes do-nothing.

And if the commit fails? Will the future roll back the transaction, even if
the transaction was already destroyed?

To avoid this you'd have for the future to keep the transaction alive, which
contradicts to it being move-only (unique vs. shared ownership problems
again, I believe).

> You are probably now about to ask if the returned future's destructor
> is blocking. It is not. The transferred state is kept around by the
> AFIO engine until success or failure and disposed of internally.

If I wanted to ask that I would have.

> > - Why is your transaction object not RAII?
>
> You may have a different definition to me. The idea is when you begin
> a transaction, you pin a copy of the internally held key-value index
> to that transaction. You then say what you are going to modify to the
> transaction object via add(). If an exception throw occurs before the
> commit(), the transaction destructor will throw away any
> modifications you have added and it unpins the internally held
> key-value index.

In case you didn't know: RAII is a silly acronym which stands for 'Resource
Acquisition Is Initialization'. IOW, a resource is acquired by the
constructor and released by the destructor. Your transaction does not
acquire anything while being constructed, it does not even have an
initializing non-default constructor. However your begin_transaction() does
acquire that resource and IIUC somehow magically transfers it into the
transaction. That's the reason why I asked this question.

> > - Why do you need begin_transaction?
>
> It pins a copy of the internally held key-value index to the
> transaction object. Later on, during commit, it should be able to
> match off key lookups with key updates to decide if two concurrent
> commits conflict or not.

I should have asked, why doesn't `transaction` provide a constructor which
takes a reference to the outer blob-store and that boolean value (even if I
still believe booleans in APIs is something to avoid, but you would probably
disregard it as being too compsci-like, and not sufficiently 'haphazardly'
as you prefer doing your design - note: your own words).

> This interface is purely for the AFIO tutorial only. Its concurrent
> transaction handling policy is to always abort if any concurrent
> update happened since the index used for the transaction was pinned.
> This is deliberately simple for the tutorial.

Ahh, now it's for a tutorial only, previously you claimed this to be 'ready
for real-world' and you already were planning to propose it for Boost review
in 2017 if not for standardization.

> > - Why do you need two load_blob() functions?
>
> One loads a blob into a set of user supplied scatter buffers.
>
> The other memory maps the blob and returns it as a set of potentially
> discontinuous buffers. This saves you having to allocate your own
> buffers.

How can a blob-store which hands out memory mapped ranges pointing into a
file be thread-safe? What about if two threads ask for the same memory
mapped region for the same file? Can you handle that properly, even if both
threads write to that same region?

> The reason there are two forms is that there is limited free address
> space on 32 bit architectures.
>
> > - Why is your buffertype so over-specified, shouldn't any range do?
>
> std::vector is ABI stable and friendly to C and Microsoft COM
> wrappers.

You'd surprised to hear that there are no guarantees for std::vector to be
ABI stable. Especially in the MS world where every compiler version is
allowed to break C++ ABI compatibility. Granted, every sane implementation
will make it ABI stable, but that's beside the point of my question.

Wouldn't a std::vector<std::pair<char const*, char const*>> be as fine?
Shouldn't be anything usable for which std::begin() and std::end() are well
defined? Wouldn't a std::list<Range> do the same thing? Etc.

> > - Why can't the user specify/provide the type of the hash? std::hash<>
> > anybody?
>
> For a content addressible data store, std::hash<> collides too
> frequently.

std::hash<> is just an example and for some people it might be good enough.
What I should have suggested was using the Hash concept
(http://en.cppreference.com/w/cpp/concept/Hash), though.

> The choice between Spooky (non-crypto) and Blake2b (crypto) is all
> anyone needs. There is a good argument for renaming them to "fast"
> and "quality" respectively which I would do if this were not a
> tutorial example.

Again that assumption that you know what your user needs, this time not even
giving a chance to overrule your ahh-so-sensible-defaults. BTW, I disagree
violently with your statement that 'good design is about choosing good
defaults'. Good design is about being simple to use (which might encompass
sensible defaults) while allowing the user to still use your library for
things you haven't even dreamt of.

> > - Why does one of the load_blob returns a
> > future<std::shared_ptr<buffers_type>>? Handle to handle to handle to
> data
> > this time? Shared ownership again, really?
>
> shared_ptr can opaquely call an internally supplied destruction
> callback to de-memory map the blob.

You're exposing your library internals to the user. Just because you believe
shared_ptr is the only way to properly destruct things doesn't mean that you
should use it.

> You could return some custom
> smart pointer type which does the same.
 
Or you return a custom type with value semantics which exposes unique
ownership through its API and still does the correct thing. I strongly
believe that future<some_custom_type> could do the right thing - if
`some_custom_type` was designed properly.

Also note, instead of 3 levels of indirection in your design (all bets are
off if you try to tell who owns which of the indirections), this would give
you just one with well define ownership semantics.

> I felt it was much of a
> muchness. I know you'll disagree with that decision, but it's a
> tutorial, and an additional custom smart pointer type is extra screen
> verbiage.

Again, I thought it was a real-world-ready-for-boost-review-design. Please
make up your mind.

> > - Can this blob-store be distributed over many machines?
>
> It can be concurrently used from many machines, including via CIFS
> network shares. It would be trivial to extend to be distributed
> across multiple machines, but that is an implementation detail.

Ok.

> > Can it be in memory only (with transparent disk backup if needed)?
>
> The kernel file page cache and extents based design makes it
> effectively memory only with very lazy but write ordered flushing to
> disk. If used with scatter gather buffers allocated via
> afio::page_allocator, it also is zero memory copy on Windows and BSD.

How is such a design making the access to the data thread-safe?

> > - Why do you specify undefined behavior into your API (see your union).
> Why
> > do you need this in the first place?
>
> The compiler will correctly align a __m128i on all the three major
> compilers which allows for the hash implementations to be more
> efficient. I left out the #ifdef SSE2 for brevity, but the final
> edition would have it.

I'm able to read code, so I understood your intention. That does not make it
more conforming, though. It's still undefined behavior. I thought we were
over this kind of stuff.

> > - What's the concurrency/thread-safety guarantees of each of the
> functions?
>
> Undecided. I currently expect thread safety for everything will
> naturally emerge without even needing a mutex, but my local
> implementation here isn't finished yet.

If you expose shared ownership (currently you do) you will have to have some
kind of synchronization.

> > etc. many more open questions remain. Please, don't get me wrong, I
> don't
> > need/want answers to those questions right away, I'm just listing those
> for
> > you to think about.
> >
> > I however suggest that you redesign AFIO first (including the proposed
> AFIO
> > API), before you even start thinking about submitting this to Boost
> review.
>
> This is purely for the front AFIO tutorial only and what I would call
> an unguaranteed solution. It has an over engineered interface for a
> tutorial, but I also want to demonstrate how AFIO's features deliver
> the solution in just 1000 lines of code.

That does not mean anything. With the appropriate abstractions you can
implement it with as many lines of code as you expose API functions
(multiplied by 3 if you count the '{' and '}').

> A production ready solution with well tested power loss behaviour I
> doubt I could bring for Boost review until early 2017 unless someone
> sponsors me to work on this full time, in which case Spring 2016
> would a maximum.

I have neither the means to sponsor you, nor would I contemplate doing so
even if I had.

> I have had enough of working 60-80 hour weeks which
> has been the case most weeks since February and has contributed in
> large part to my recent grumpiness.

None of my questions was asking for this information. Thanks for providing
it anyways.

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk