Boost logo

Boost :

Subject: Re: [boost] [afio] Formal review of Boost.AFIO
From: Thomas Heller (thom.heller_at_[hidden])
Date: 2015-08-25 02:22:00


>
>> - The author clearly states that performance is not one of his goals
>> (http://thread.gmane.org/gmane.comp.lib.boost.devel/261914/focus=262478). Yet,
>> the documentation, especially the introduction, puts a great emphasize
>> on that, while the presented benchmarks do not show any of those
>> promises or compare apples with oranges.
>
> I place correctness and lock free file system programming before
> absolute maximum performance yes.

Sure, that's something one should stride for, but that's not what the
documentation
conveys.

>
> The comparing of apples to oranges is a fair point.
>
>> - Portability, the author claims to present a portable solution to the
>> ACID problem however goes into great length in explaining the
>> differences between various OSes and FSes from which I concluded that
>> users of the library still have to implement special code paths for some
>> systems. This comes especially obvious when looking at example code
>> provided in the documentation.
>
> There is a lot of variability in conformance and behaviour in filing
> systems and operating systems - even versions of the Linux kernel.
>
> I could simply remove all mention of platform specific workarounds in
> the code examples and the docs. But that would not be useful to end
> users - they *want* to know the platform specific quirks and
> workarounds.

There is always a difference between entry level, basic documentation of
the API you are presenting and going all in by tuning every possible
knob there is. As I mentioned in the "usefulness" section, lots of
people wanting an async file I/O API probably don't care about race
freedom or platform specific quirks you ironed out. They just want
something to get the job done.

>
>> - "Readiness": The docs list tons and tons of features which aren't
>> really there yet. One can get the impression that it is only half done.
>> Especially concerning is that running the unit tests might actually
>> break your hard drive
>> (https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/extents_theory/extents.html)
>> which is not very assuring in the context of the promised feature set.
>
> It is unfortunate that you interpreted my well defined future roadmap
> for AFIO as meaning "it isn't finished". I could simply remove all
> mention of the roadmap completely, and then AFIO would appear to be
> "finished".

That's not the point. The documentation is full of features that aren't
there yet, for example:
 - Introduction:
       "Exclusive lock files (manually operated support already there,
async support coming in v1.5)."
       "File change monitoring (coming in v1.5)."
       "File byte range advisory locking (coming in v1.5)."
 - In the cheat sheet, the complete last paragraph.
 - 2. World's simplest named blob store in AFIO (synchronous)
       "This is the first use of asynchronous i/o in this tutorial. AFIO
provides a custom future<T> type extending the lightweight monadic
futures framework in Boost.Monad, so you get all the C++ 1z Concurrency
TS extensions, C++ 1z coroutines support and Boost.Thread future
extensions in the AFIO custom future<>. There are also many additional
extensions beyond what Boost.Thread or the Concurrency TS provides,
including foreign future composable waits."

Having a roadmap is all nice and dandy, but repeating it all over the
place just creates the impression that your hasn't been completed yet.

>
> But it would be a lie. I personally highly value a well planned
> documented future evolution of a library. It helps me to decide if
> locking myself into a library is a good idea. That's why the roadmap
> is so publicly well defined - the end user knows exactly where the
> library is at, and where it is expected to go next.

Right, as said, having a roadmap is nothing bad, but after reading
through your documentation, the impression of just having a "almost
done" library is created. I as a user wouldn't feel comfortable in using
it as of yet.

>
>> It looks like it would have been better to let the dust settle after the
>> final additions to the documentation and maybe even properly implement
>> the praised backend before aiming for review. I think it would also be
>> best to only include the latest version of the library for inclusion to
>> boost, is the community really supposed to also review the apparently
>> inferior V1 of the library?
>
> Those are ABI versions. If you build and link against v1, I give you
> the promise your end user code will build and link against v1 for the
> lifetime of v1. For any break in ABI, I'll iterate to v2 (as has
> already happened), or v3 and so on.

Sure, but as far as I understand, v2 is something like the first
official release, so why do I want to bother with v1?

>
>>> 2. What is your evaluation of the design?
>>
>> The design of the documented, user facing API isn't what I expected from
>> a modern C++ API. Following are some points, in no particular ordering,
>> that struck me most.
>>
>> - afio::future<T = void>:
>>
>> As far as I can see, there is not a single function documented that
>> returns anything than some equivalent to "shared_future<handle_ptr>".
>
> The synchronous single shot functions return an afio::handle_ptr or
> some type appropriate for that function (e.g. enumerate() returns a
> vector of directory_entry).
>
> The asynchronous single shot functions return an afio::future<T>
> where T is usually void. afio::future<T>.get_handle() returns an
> afio::handle_ptr, while afio::future<T>.get() returns whatever T is
> (which might be void).
>
> The batch functions return vectors of afio::future<T>.

Ahh i missed the enumerate (btw, the sync enumerate example only shows
the usage of the async version...) and stat functions (and maybe some
others). The question remains:
Why do they also need to return a handle?

>
>> The usage of afio::future<void> (explicitly added the default parameter)
>> is, IMHO, very confusing as soon as you pass them as dependencies to
>> another operation. The void template parameter always looks like "just a
>> signal" instead most of the time, the actual *handle* associated with
>> that future is the dependency, and from time to time even needed as an
>> argument to the async operation.
>
> I am unaware of any occasion where an async operation consumes a
> handle_ptr. If there are any, I would expect that to be a bug needing
> to be fixed.

Consuming as in taking it as a parameter, not consuming as in being
moved into a function.
All your "preconditions" are essentially handles to futures, right? So
why not express it in the type?

Or file open functions return a handle (or a future to handle), right?
The return type is future<void> though. This is not nice.
All in all, *everything* that returns an afio::future<T> (where T might
be void) has two return values, that's very irritating.
In addition your afio::future violates the "single responsibility
principle" it sometimes serves as return value transport mechanism,
sometimes to just be able to (out of convenience I guess)
be able to access the underlying handle and sometimes both. What does
"valid()" or "valid(true)" mean then?
>From the documentation it's also not clear at all what semantics are
attached to afio::future. It looks like it is something between a unique
future and a shared future but couldn't decide between the two.

>
> You may not be aware that afio::future<void> implicitly converts from
> an afio::handle_ptr. It's equivalent to make_ready_future(handle).
> This is effectively free of cost under lightweight futures, hence it
> being used.

Pleas don't do that! Please add a make_ready_future functionality, those
non explicit constructors make my head hurt and are a potential source
of errors, as convenient as they may seem at first sight. Just by having
those explicit factory doesn't make anything less efficient.

>
>> In addition,
>> "shared_future<handle_ptr>" has two indirections, one to the shared
>> state of the future and one to the actual handle.
>
> You have spotted one of those unusual design choices I mentioned
> earlier.
>
> It turns out that shared_ptr<shared_ptr<handle>> (which is
> effectively what this is) is much faster than shared_ptr<handle>.
> This is because the outer shared_ptr is local to the operation in
> question and therefore its use count is not contended across threads,
> whereas the inner shared_ptr manages lifetime for the handle and is
> global across operations and its use count gets very much contended
> across threads.

Every shared_ptr comes with a cost be it contended or not ...

>
> I agree the choice is not obvious, and indeed this was not the design
> in very early AFIO. I came to it through being puzzled about poor
> benchmark performance.
>
> Is there a better alternative to this? To the next question ...

Why does it need to be shared_ptr<handle> in the first place? Why not
just handle? A shared_ptr always implies shared ownership, when for
example a boost::afio::file function returns, with whom is the ownership
shared?

>
>> Moreover, I disagree that afio::future<T> should have shared future
>> semantics by default. I am a proponent of the
>> std::future/std::shared_future design and believe this should be
>> followed by all libraries dealing with some sort of futures. I think
>> most of the design decisions are based on shared_future only semantics,
>> which is probably the reason why I have problems with them.
>
> Did you notice that afio::future<T> has non-shared semantics for T?
> That's why it's called afio::future<>, not afio::shared_future<>.

As stated earlier, I have no idea which semantics exactly a
afio::future<T> has ...

>
> You are correct that the implicit handle_ptr available from
> afio::future<>.get_handle() always has shared future semantics.
> That's because handle_ptr is a shared_ptr, so you are already
> referring to shared state, and therefore shared future semantics
> would seem to fit best.
>
> Should afio::future<>.get_handle() always have shared future
> semantics? This is a very profound question, and the entire of the
> AFIO design hangs around that design choice. It was not an easy
> decision back in 2012, and it is still not an easy decision in 2015
> even after hundreds of hours working on AFIO.
>
> I share completely your unease with the design choice, and I have the
> same problems you do. I can only tell you that on balance, I think
> that shared future semantics for the handle_ptr only make for a much
> easier implementation, and I chose ease of maintenance over
> strictness.
>
> I don't think that this design choice impacts end users. No end user
> code is going to get broken or misdesigned (in my opinion) by
> handle_ptr always having shared future semantics. It is, in my
> opinion, a quality of internal implementation decision.

It is misdesigned in the way that it implies shared ownership from the
start on, that is something that should be left to the user.

>
> I think the true controversial design choice is the concept of a
> future<> type transporting more than one thing each with differing
> semantics. Especially when future<T> is allowed and expected to type
> slice into future<void> when entering the AFIO DLL boundary.
>
> These controversial design choices are exactly why I wanted a Boost
> peer review now. If someone can think of a better viable solution
> which allows a stable DLL ABI, I am more than happy for AFIO to be
> rejected now and I reimplement a fundamentally better design in a new
> reborn AFIO.

If handle and future<T> is ABI safe, then future<handle> is ABI safe.
Besides, a implicit conversion from future<T> to future<void> is
perfectly acceptable, but should not and doesn't need to include type
slicing.

>
>> - Extensive use of shared_ptr:
>>
>> Some functions return some shared_ptr (specifically handle and
>> dispatcher), therefore probably always relying on some sort of memory
>> allocation. This seems to be due to the fact that only opaque abstract
>> base classes are returned from those API functions. Why not return
>> unique_ptr or even by value of the distinct type?
>
> The answer is lifetime and stability of DLL ABI. AFIO v1.0 didn't
> make as much use of shared_ptr, and it had race issues and a bad
> balance of locking as a result. I rewrote the core engine in v1.2 to
> use shared_ptr to manage lifetime and therefore remove all but two
> locks per operation because the shared_ptr can be relied upon to keep
> lifetime without locking. Performance was immensely improved.

>
> Can a lighter weight AFIO be implemented? Yes. Should it be? No [1].
> The overhead of a shared_ptr is immaterial to a 1m CPU cycle i/o
> operation. And shared_ptr makes the code much easier to maintain, and
> much harder to accidentally introduce races during a modification.

That's actually a very strange statement. shared_ptr implies shared
ownership, aka some shared state. Sharing state requires synchronization
to protect it from races. And yes, if you modify this shared from two
threads without synchronization, you accidentally introduced a race.

>
> I haven't introduced a race (according to the thread sanitiser) into
> AFIO in well over a year thanks to pervasive use of shared_ptr.

That doesn't make a lot of sense ... Of course, if your state is not
shared, you don't introduce races, but if you don't have a shared state,
why use shared_ptr?

>
> [1]: This isn't an opinion - I tested this empirically. I spent two
> days hacking out the shared_ptr to see what the effect would be on
> performance of the benchmark code. It improved by 0.5%. Not worth it.

You always emphasize that you care more about correctness than
performance, exposing the proper semantics in a library API is also some
form of correctness and leads to a certain quality of the design. Just
using shared_ptr everywhere is convenient, sure, is it the right thing
to do? I don't think so.

>
>>> 3. What is your evaluation of the implementation?
>>
>> I refrained from actually looking at the implementation. This is
>> actually a pretty severe point. The meat of the implementation is split
>> into 4 headers only, with two headers (afio.hpp and
>> detail/impl/afio.ipp) having the majority code with over 6000 lines of
>> codes each! Despite the actual content, I believe this is a maintenance
>> nightmare. I don't want to review such large blobs of text.
>
> It is true I only orthonongally split files when I see a point
> because I like as few files as possible, as you can tell. I also like
> really long lines.
>
> Most Boost libraries keep a header per type. I dislike that - I
> prefer a header per orthogonal reusable collection of stuff. On that
> basis there is some splitting of afio.hpp which is probably a good
> idea. I've recorded that as
> https://github.com/BoostGSoC13/boost.afio/issues/90.
>
>>> 4. What is your evaluation of the documentation?
>>
>> - I have no idea what the table in
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/overview.html is
>> trying to tell me
>
> Can you suggest what would be better?

If I would know what information you wanted to convey: maybe. After some
email exchanges it appears that you mean the number of overloads, but
again, what does that tell me? You want to give me the functions your
library exposes and probably the capabilities of those functions?
Express it that way!

>
>> - The design introduction contains lots of text that seems like
>> unrelated anecdotes
>> - The reference section doesn't cross link between the different
>> related API functions/classes that are needed for one specific
>> class/function.
>
> Can you explain what you mean exactly? Do you mean links on each
> reference page?

I mean cross reference between things I might be interested in while
looking at a specific reference section. For example "file (batch)"
https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/file/file_1_batch.html

it takes a vector of path_req, what the heck was path_req again?

>
> I had thought the categorisation of functions by functionality
> implemented made it easy to find what you needed quickly? How would
> you improve the reference organisation?

The organisation is fine. It, IMHO, just misses some cross referencing.

>
>> On the incomplete part:
>> - The workshop tutorial is not finished.
>> - Some references are not completely documented, for example:
>>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/directory_entry.html

For example st_dev, st_ino, st_type etc.

>>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/directory_entry/directory_entry_hash.html

what does operator()(const directory_entry & p) so? what does it return?

>>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/classes/dispatcher.html

read and write.

>
> I'm confused. What is not completely documented?
>
> Do you mean that trivial APIs do not have their own full reference
> page?

See above.

>
>> On the examples do not match reference documentation part:
>> -
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/make_io_req_3_length_deducing.html
>>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/make_io_req_3_length_deducing0.html
>>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/make_io_req_4_length_specifying.html
>> The example mentioned in the above entries do not mention the
>> make_io_req function at all.
>
> You are absolutely right. I apologise for the mistake (it was a
> typo). I have logged these documentation errors at
> https://github.com/BoostGSoC13/boost.afio/issues/91.
>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/dir/dir_1_batch.html
>> Why does this example not focus on the presented function?
>
> I felt that a single example showing the batch use and the free
> function use shown comparatively alongside other functions you would
> likely be using was more useful. As I mention below, it looks like it
> is instead confusing and I have logged an issue for fixing it.
>
>> On the examples containing errors part:
>> -
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/workshop/naive_afio_async.html
>> "
>> // Boost.Monad futures are also monads, so this implies a
>> make_ready_future()
>> return std::current_exception();
>> "
>> The return type is shared_future<data_store::ostream>. What does this
>> have to do with Boost.Monad and any of its behavior? Which shared_future
>> is meant here that has a non-explicit ctor
>> that takes a std::exception_ptr?
>
> You may not have realised that all shared_future<T> mentioned by the
> documentation is boost::monad::lightweight_futures::shared_future<T>.
> Not std::shared_future<T>.
>
> boost::monad::lightweight_futures::shared_future<T> will implicitly
> convert from a std::exception_ptr into an already ready future.
> That's what the comment was explaining, as that is not normal
> shared_future<T> behaviour.

Ok, this is overly confusing ... really, if you want to implement a
"Concurrency TS compatible" future, please do so in its full extent,
everything else is just confusing. Especially the non-explicit single
argument constructors, please get rid of them.

>
>> - For example (there are many more):
>>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/file/file_1_batch.html
>> documents dispatcher::file taking a vector of path_req, in the
>> example, the vector is constructed, but only the front is passed to the
>> function. This seems very odd.
>
> The batch API is exampled further below under the section commented
> "Batch". I think you were looking at the top part.
>
> I'm getting from this that a unified code example for both batch and
> single shot functions on both batch and single shot pages is
> confusing. Logged at
> https://github.com/BoostGSoC13/boost.afio/issues/92.

The biggest problem is that the unified example 1) contains too much
other, unrelated information and 2) doesn't show the usage of the
function on that page. The batch function is not in the example.
Furthermore, why the heck do you create a vector to just get its first
element? This is kind of inefficient and confusing, this appears all
over the place (not only in the documentation but also in the
implementation).

>
>> Other documentation flaws:
>> -
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/functions/process_threadpool.html
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/boost_afio_validate_inputs.html
>> looks odd, there are no parameters or anything there, so why does this
>> page contain a section for it?
>
> Sadly out of my control. It's a bug in the doxygen_xml2qbk tool.
>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/file_flags.html
>> the synopsis is very hard to read.
>
> Can you explain how it is hard to read? It tells you what each flag
> does. How could it be improved?

It can be improved by having one enumeration value per line.

>
>> (same goes for:
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/fs_metadata_flags.html
>> and
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/reference/macros_flags/metadata_flags.html)
>> - Why document undocumented features?
>
> Unfortunately doxygen_xml2qbk seems to insist on outputting
> documentation for undocumented members, or maybe I don't understand
> the tool or maybe I've misconfigured it. So I can have a blank
> documentation entry like for afio::file_flags::os_lockable which
> seems inferior to saying "Don't use this".
>
>> https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini_programs/atomic_logging.html
>> - Can you please strip example code from commented sections of code
>> that obviously don't add anything to the example (or "#if 0"ed sections
>> or "legacy" behavior)?
>
> Logged to https://github.com/BoostGSoC13/boost.afio/issues/93.
>
>> - It would have been nice to see all examples (at least in a non
>> advanced section) to not rely on platform dependent macro definitions.
>
> If you are referring to
> https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/mini
> _programs/filesystem_races.html, one is trapped by substantial
> differences in Windows filesystem semantics. I still think it more
> important to real life end users of AFIO to demonstrate and show
> those platform specific differences than pretend they don't exist.
>
>> In addition, I personally dislike the "Try AFIO now in online web
>> compiler" button. Why does it have to appear on *every* single page
>> (there is also a bug that it doesn't appear in some of the reference doc
>> sections)? Besides, it doesn't really match the style of the rest of the
>> documentation. Which brings me to the "Comments" part, doesn't match the
>> style either, which makes it look totally out of place especially since
>> it comes from some third party provider, and I am not sure I can trust
>> that. It looks especially odd when javascript is disabled. Isn't
>> discussing a library and its documentation on a mailing list not enough?
>
> Regarding the online web compiler button, I can remove it from every
> page, and place it just on one or two pages?

I'd personally put it onto one page. Just make a new section "Try AFIO
now" or so...

>
> Regarding the comments, I had thought there was consensus here that
> per-page commenting was a good idea, so that's why they are there.

I didn't take part of that discussion. I just dislike this disqus
thingy, it doesn't fit the style of the rest of the page and is totally
ill suited for discussing and commenting on documentation for a C++
library, in order to place code, you have to surround it with a
<pre><code class="C++">your code here</pre></code> thingy.

>
> Thanks hugely for the review Thomas. I found it very valuable.
>
> Niall
>
>
>
>
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

-- 
Thomas Heller
Friedrich-Alexander-Universität Erlangen-Nürnberg
Department Informatik - Lehrstuhl Rechnerarchitektur
Martensstr. 3
91058 Erlangen
Tel.: 09131/85-27018
Fax:  09131/85-27912
Email: thomas.heller_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk