Boost logo

Boost :

Subject: Re: [boost] [afio] Formal review of Boost.AFIO
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-08-24 11:16:42


On 24 Aug 2015 at 13:33, Thomas Heller wrote:

> This is my review of AFIO.
>
> DISCLAIMER: I am not on a personal vendetta or hold any grudge against
> the author. I spent valuable family time in order to review the
> presented work and opinions expressed so far.

Firstly, I wish to publicly say that I consider any past interactions
between us to be water under the bridge. You have been very useful
off-list, and you were one of the first to take a good look at
lightweight future-promise and I found your feedback valuable. I want
to say I greatly appreciate your support and help and feedback.

Secondly, your review is exactly why I came here for a review, and I
can tell how much effort you invested because you spotted some of the
more interesting design tradeoffs. Thank you Thomas, genunely. Your
review is very useful to me.

> - The author clearly states that performance is not one of his goals
> (http://thread.gmane.org/gmane.comp.lib.boost.devel/261914/focus=262478). Yet,
> the documentation, especially the introduction, puts a great emphasize
> on that, while the presented benchmarks do not show any of those
> promises or compare apples with oranges.

I place correctness and lock free file system programming before
absolute maximum performance yes.

The comparing of apples to oranges is a fair point.

> - Portability, the author claims to present a portable solution to the
> ACID problem however goes into great length in explaining the
> differences between various OSes and FSes from which I concluded that
> users of the library still have to implement special code paths for some
> systems. This comes especially obvious when looking at example code
> provided in the documentation.

There is a lot of variability in conformance and behaviour in filing
systems and operating systems - even versions of the Linux kernel.

I could simply remove all mention of platform specific workarounds in
the code examples and the docs. But that would not be useful to end
users - they *want* to know the platform specific quirks and
workarounds.

> - "Readiness": The docs list tons and tons of features which aren't
> really there yet. One can get the impression that it is only half done.
> Especially concerning is that running the unit tests might actually
> break your hard drive
> (https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/extents_theory/extents.html)
> which is not very assuring in the context of the promised feature set.

It is unfortunate that you interpreted my well defined future roadmap
for AFIO as meaning "it isn't finished". I could simply remove all
mention of the roadmap completely, and then AFIO would appear to be
"finished".

But it would be a lie. I personally highly value a well planned
documented future evolution of a library. It helps me to decide if
locking myself into a library is a good idea. That's why the roadmap
is so publicly well defined - the end user knows exactly where the
library is at, and where it is expected to go next.

> It looks like it would have been better to let the dust settle after the
> final additions to the documentation and maybe even properly implement
> the praised backend before aiming for review. I think it would also be
> best to only include the latest version of the library for inclusion to
> boost, is the community really supposed to also review the apparently
> inferior V1 of the library?

Those are ABI versions. If you build and link against v1, I give you
the promise your end user code will build and link against v1 for the
lifetime of v1. For any break in ABI, I'll iterate to v2 (as has
already happened), or v3 and so on.

> > 2. What is your evaluation of the design?
>
> The design of the documented, user facing API isn't what I expected from
> a modern C++ API. Following are some points, in no particular ordering,
> that struck me most.
>
> - afio::future<T = void>:
>
> As far as I can see, there is not a single function documented that
> returns anything than some equivalent to "shared_future<handle_ptr>".

The synchronous single shot functions return an afio::handle_ptr or
some type appropriate for that function (e.g. enumerate() returns a
vector of directory_entry).

The asynchronous single shot functions return an afio::future<T>
where T is usually void. afio::future<T>.get_handle() returns an
afio::handle_ptr, while afio::future<T>.get() returns whatever T is
(which might be void).

The batch functions return vectors of afio::future<T>.

> The usage of afio::future<void> (explicitly added the default parameter)
> is, IMHO, very confusing as soon as you pass them as dependencies to
> another operation. The void template parameter always looks like "just a
> signal" instead most of the time, the actual *handle* associated with
> that future is the dependency, and from time to time even needed as an
> argument to the async operation.

I am unaware of any occasion where an async operation consumes a
handle_ptr. If there are any, I would expect that to be a bug needing
to be fixed.

You may not be aware that afio::future<void> implicitly converts from
an afio::handle_ptr. It's equivalent to make_ready_future(handle).
This is effectively free of cost under lightweight futures, hence it
being used.

> In addition,
> "shared_future<handle_ptr>" has two indirections, one to the shared
> state of the future and one to the actual handle.

You have spotted one of those unusual design choices I mentioned
earlier.

It turns out that shared_ptr<shared_ptr<handle>> (which is
effectively what this is) is much faster than shared_ptr<handle>.
This is because the outer shared_ptr is local to the operation in
question and therefore its use count is not contended across threads,
whereas the inner shared_ptr manages lifetime for the handle and is
global across operations and its use count gets very much contended
across threads.

I agree the choice is not obvious, and indeed this was not the design
in very early AFIO. I came to it through being puzzled about poor
benchmark performance.

Is there a better alternative to this? To the next question ...

> Moreover, I disagree that afio::future<T> should have shared future
> semantics by default. I am a proponent of the
> std::future/std::shared_future design and believe this should be
> followed by all libraries dealing with some sort of futures. I think
> most of the design decisions are based on shared_future only semantics,
> which is probably the reason why I have problems with them.

Did you notice that afio::future<T> has non-shared semantics for T?
That's why it's called afio::future<>, not afio::shared_future<>.

You are correct that the implicit handle_ptr available from
afio::future<>.get_handle() always has shared future semantics.
That's because handle_ptr is a shared_ptr, so you are already
referring to shared state, and therefore shared future semantics
would seem to fit best.

Should afio::future<>.get_handle() always have shared future
semantics? This is a very profound question, and the entire of the
AFIO design hangs around that design choice. It was not an easy
decision back in 2012, and it is still not an easy decision in 2015
even after hundreds of hours working on AFIO.

I share completely your unease with the design choice, and I have the
same problems you do. I can only tell you that on balance, I think
that shared future semantics for the handle_ptr only make for a much
easier implementation, and I chose ease of maintenance over
strictness.

I don't think that this design choice impacts end users. No end user
code is going to get broken or misdesigned (in my opinion) by
handle_ptr always having shared future semantics. It is, in my
opinion, a quality of internal implementation decision.

I think the true controversial design choice is the concept of a
future<> type transporting more than one thing each with differing
semantics. Especially when future<T> is allowed and expected to type
slice into future<void> when entering the AFIO DLL boundary.

These controversial design choices are exactly why I wanted a Boost
peer review now. If someone can think of a better viable solution
which allows a stable DLL ABI, I am more than happy for AFIO to be
rejected now and I reimplement a fundamentally better design in a new
reborn AFIO.

> - Extensive use of shared_ptr:
>
> Some functions return some shared_ptr (specifically handle and
> dispatcher), therefore probably always relying on some sort of memory
> allocation. This seems to be due to the fact that only opaque abstract
> base classes are returned from those API functions. Why not return
> unique_ptr or even by value of the distinct type?

The answer is lifetime and stability of DLL ABI. AFIO v1.0 didn't
make as much use of shared_ptr, and it had race issues and a bad
balance of locking as a result. I rewrote the core engine in v1.2 to
use shared_ptr to manage lifetime and therefore remove all but two
locks per operation because the shared_ptr can be relied upon to keep
lifetime without locking. Performance was immensely improved.

Can a lighter weight AFIO be implemented? Yes. Should it be? No [1].
The overhead of a shared_ptr is immaterial to a 1m CPU cycle i/o
operation. And shared_ptr makes the code much easier to maintain, and
much harder to accidentally introduce races during a modification.

I haven't introduced a race (according to the thread sanitiser) into
AFIO in well over a year thanks to pervasive use of shared_ptr.

[1]: This isn't an opinion - I tested this empirically. I spent two
days hacking out the shared_ptr to see what the effect would be on
performance of the benchmark code. It improved by 0.5%. Not worth it.

> > 3. What is your evaluation of the implementation?
>
> I refrained from actually looking at the implementation. This is
> actually a pretty severe point. The meat of the implementation is split
> into 4 headers only, with two headers (afio.hpp and
> detail/impl/afio.ipp) having the majority code with over 6000 lines of
> codes each! Despite the actual content, I believe this is a maintenance
> nightmare. I don't want to review such large blobs of text.

It is true I only orthonongally split files when I see a point
because I like as few files as possible, as you can tell. I also like
really long lines.

Most Boost libraries keep a header per type. I dislike that - I
prefer a header per orthogonal reusable collection of stuff. On that
basis there is some splitting of afio.hpp which is probably a good
idea. I've recorded that as
https://github.com/BoostGSoC13/boost.afio/issues/90.

> > 4. What is your evaluation of the documentation?
>
> - I have no idea what the table in
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html is
> trying to tell me

Can you suggest what would be better?

> - The design introduction contains lots of text that seems like
> unrelated anecdotes
> - The reference section doesn't cross link between the different
> related API functions/classes that are needed for one specific
> class/function.

Can you explain what you mean exactly? Do you mean links on each
reference page?

I had thought the categorisation of functions by functionality
implemented made it easy to find what you needed quickly? How would
you improve the reference organisation?

> On the incomplete part:
> - The workshop tutorial is not finished.
> - Some references are not completely documented, for example:
>
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/directory_entry.html
>
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/directory_entry/directory_entry_hash.html
>
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dispatcher.html

I'm confused. What is not completely documented?

Do you mean that trivial APIs do not have their own full reference
page?

> On the examples do not match reference documentation part:
> -
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/make_io_req_3_length_deducing.html
>
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/make_io_req_3_length_deducing0.html
>
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/make_io_req_4_length_specifying.html
> The example mentioned in the above entries do not mention the
> make_io_req function at all.

You are absolutely right. I apologise for the mistake (it was a
typo). I have logged these documentation errors at
https://github.com/BoostGSoC13/boost.afio/issues/91.

> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/dir/dir_1_batch.html
> Why does this example not focus on the presented function?

I felt that a single example showing the batch use and the free
function use shown comparatively alongside other functions you would
likely be using was more useful. As I mention below, it looks like it
is instead confusing and I have logged an issue for fixing it.

> On the examples containing errors part:
> -
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/naive_afio_async.html
> "
> // Boost.Monad futures are also monads, so this implies a
> make_ready_future()
> return std::current_exception();
> "
> The return type is shared_future<data_store::ostream>. What does this
> have to do with Boost.Monad and any of its behavior? Which shared_future
> is meant here that has a non-explicit ctor
> that takes a std::exception_ptr?

You may not have realised that all shared_future<T> mentioned by the
documentation is boost::monad::lightweight_futures::shared_future<T>.
Not std::shared_future<T>.

boost::monad::lightweight_futures::shared_future<T> will implicitly
convert from a std::exception_ptr into an already ready future.
That's what the comment was explaining, as that is not normal
shared_future<T> behaviour.

> - For example (there are many more):
>
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/file/file_1_batch.html
> documents dispatcher::file taking a vector of path_req, in the
> example, the vector is constructed, but only the front is passed to the
> function. This seems very odd.

The batch API is exampled further below under the section commented
"Batch". I think you were looking at the top part.

I'm getting from this that a unified code example for both batch and
single shot functions on both batch and single shot pages is
confusing. Logged at
https://github.com/BoostGSoC13/boost.afio/issues/92.

> Other documentation flaws:
> -
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/process_threadpool.html
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/boost_afio_validate_inputs.html
> looks odd, there are no parameters or anything there, so why does this
> page contain a section for it?

Sadly out of my control. It's a bug in the doxygen_xml2qbk tool.

> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/file_flags.html
> the synopsis is very hard to read.

Can you explain how it is hard to read? It tells you what each flag
does. How could it be improved?

> (same goes for:
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/fs_metadata_flags.html
> and
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/metadata_flags.html)
> - Why document undocumented features?

Unfortunately doxygen_xml2qbk seems to insist on outputting
documentation for undocumented members, or maybe I don't understand
the tool or maybe I've misconfigured it. So I can have a blank
documentation entry like for afio::file_flags::os_lockable which
seems inferior to saying "Don't use this".

> https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini_programs/atomic_logging.html
> - Can you please strip example code from commented sections of code
> that obviously don't add anything to the example (or "#if 0"ed sections
> or "legacy" behavior)?

Logged to https://github.com/BoostGSoC13/boost.afio/issues/93.

> - It would have been nice to see all examples (at least in a non
> advanced section) to not rely on platform dependent macro definitions.

If you are referring to
https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini
_programs/filesystem_races.html, one is trapped by substantial
differences in Windows filesystem semantics. I still think it more
important to real life end users of AFIO to demonstrate and show
those platform specific differences than pretend they don't exist.

> In addition, I personally dislike the "Try AFIO now in online web
> compiler" button. Why does it have to appear on *every* single page
> (there is also a bug that it doesn't appear in some of the reference doc
> sections)? Besides, it doesn't really match the style of the rest of the
> documentation. Which brings me to the "Comments" part, doesn't match the
> style either, which makes it look totally out of place especially since
> it comes from some third party provider, and I am not sure I can trust
> that. It looks especially odd when javascript is disabled. Isn't
> discussing a library and its documentation on a mailing list not enough?

Regarding the online web compiler button, I can remove it from every
page, and place it just on one or two pages?

Regarding the comments, I had thought there was consensus here that
per-page commenting was a good idea, so that's why they are there.

Thanks hugely for the review Thomas. I found it very valuable.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk