Boost logo

Boost :

Subject: Re: [boost] This AFIO review - a modest proposal
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-08-24 12:14:38


On 24 Aug 2015 at 5:49, Glen Fernandes wrote:

> Niall Douglas wrote:
> > Glen Fernandes wrote:
> > > Isn't "I don't understand the point of this library" a valid reason
> > > for rejection?
> >
> > In itself, no I don't think so. There are at least ten libraries in Boost
> > I have no understanding of the point of
>
> If almost nobody the understands point of a proposed library to the extent
> that almost nobody recommend its acceptance: I would think that it isn't
> unjustly rejected. I believe niche use case libraries have a place in Boost.
> I suspect, though, that an asynchronous file I/O library falls into the
> category of something that most people want in Boost.

I think that an asynchronous file i/o library in Boost is something
that people *think* they want because they erroneously believe it
will improve performance in a single swoop.

However the file system is one of the biggest performance pain points
in a computer. It has been optimised *relentlessly* such that it's
good enough almost all of the time for most people.

That's why a naïve asynchronous file i/o library like say ASIO's
stream implementation design I believe is quite useless in the real
world - you certainly wouldn't write a database with it because you
gain nothing over using the host OS APIs directly. It provides no
practical gain to anyone with real file system performance problems
because apart from the async, it offers nothing else useful like
read-write ordering guarantees.

Totally separate to the async file i/o is the race free filesystem
stuff. I would like to believe that people understand why a race free
filesystem API is important. There is consensus that you need an
abstracted file handle object, and the race free filesystem API needs
to hang off of that - indeed I remember Beman mentioning somewhere I
once read the difficulty of standardising on an abstracted file
handle object with respect to STL iostreams as being a big reason
that the Filesystem TS does not attempt to address race free
filesystem.

In AFIO I have proposed a race free filesystem API and an abstracted
file handle object. I think that approach is uncontroversial. The
decision to value add on top asynchronicity *is* controversial, but
one could call that an internal implementation detail from the
perspective of synchronous use because if you want to program race
free filesystem synchronously with AFIO, there is a full suite of
easy to use 100% synchronous APIs provided.

Should then race free filesystem be split off into a separate purely
synchronous library away from async file and filesystem? If so, how
do you design the i/o model, because you can't use STL iostreams.

This is a very good question, and it is why I am here for review
before I start the engine rewrite as I need to get feedback on this
now (BTW it isn't useful to say yes of course you should split off
synchronous race free filesystem. It is useful to say how you solve
the abstracted handle object problem - how should you read and write
from the handle? Should POSIX read/write atomicity semantics be
exposed? How should it integrate with STL iostreams and the
Filesystem TS? If these were easy questions, Beman would have
designed in a solution in the Filesystem TS already).

> I haven't figured out entirely where the disconnect is. It seems like you're
> saying "This is the async file I/O library that you need; not the async file
> I/O library that you want."

You understand me perfectly. Fundamental design mistakes by me
notwithstanding (see my response to Thomas' review).

> You also say that only a tiny fraction of developers have those needs.
> Are any of them going to be reviewing this library?

boost-dev isn't exactly full of people programming in this niche. I
have colleagues in file system communities, indeed I am supposed to
be writing a white paper on async byte range locking with none other
than Jeff Layton except this review turned up after C++ Now, so I had
to shelve the white paper until next year.

Filesystem specialists appear to get quite excited about AFIO, and as
I mentioned my CppCon talk looks like it will be surprisingly well
attended considering. The single biggest bone they pick is the
requirement for C++ 11 as that is years away for most of them. File
system code is exceptionally conservative, they won't trust C++ 11/14
until at least 2018.

> > Can you explain what is not straightforward more precisely please?
>
> Sure. With regards to the examples:
> - Be more concise,
> - Have less standard out statements,
> - Have less comments,
> - Have no conditional compilation
> * BOOST_AFIO_USE_LEGACY_FILESYSTEM_SEMANTICS? How can parts of AFIO be
> legacy?

That is not caused by AFIO. Boost.Filesystem still doesn't match the
Filesystem TS. The macro BOOST_AFIO_USE_LEGACY_FILESYSTEM_SEMANTICS
has AFIO use workarounds specific to Boost.Filesystem. As soon as
Boost.Filesystem gets fixed, I will be more than pleased to remove
the workarounds.

> * #if 0? In an example?
> * No platform specifics

I think I've either logged issues or explained myself about all of
the above in other threads. Thanks though for the list.

> The other advice I have is that you may want to omit a comment like "This
> section was not finished in time for the beginning of the Boost peer review
> due a hard drive failure induced by the testing of AFIO-based key-value
> stores in this workshop tutorial (sigh!)" in the documentation. You don't
> want prospective reviewers to wonder if they should back up their hard drive
> before trying AFIO. :-)

That comment was purely for you guys to explain the missing final
example. Which currently is over 1000 lines long, and growing - I
only got the direct-from-mmap dense hash map working late last night.
I personally think almost nobody here will be interested in studying
the code - maybe Tony van Eerd. That's it.

Lock free filesystem programming is like lock free atomic programming
- excellent progression guarantees and sometimes great performance.
But the implementation code hurts the head, especially as you
sometimes deliberately use races as part of your algorithm which is
to my knowledge not common in atomic memory lock free programming.

So why end the key-value store tutorial with such a complex final
design? Because that's the whole point of why people's assumptions
that "async i/o makes your code quicker" is flawed.

*IF* you are willing to completely turn on its head your entire
design, approach and methodology to file system programming, you can
get *spectacular* results, as you will see when you compare the
benchmarks for the one-file-per-key design to a more sophisticated
design based on how file systems actually work, not how the average
programmer thinks they work. And a library like AFIO makes doing that
much, much easier than without.

If on the other hand you think sprinkling some async on top of your
conventional file system approach and algorithms will make it go
quicker, you are probably incorrect. That's the conceptual hill the
tutorial tries to get people to climb. I suspect that is the cause of
much of the disconnect you mentioned, and could well mean that AFIO
will never be accepted into Boost as it's the wrong audience.

And I have no problem if AFIO never enters Boost. I would think it a
shame and a wasted opportunity as it solves a ton of pain points for
those with such needs, but I am not working on AFIO for the good of
my health. I specifically need AFIO for a new kind of database
product I have in mind from which I hope to retire and never have to
work again. If it is accepted into Boost, then I'll press for it to
be standardised into ISO at WG21. If it is not accepted, I'll try one
more time and then I'll move on as I have better things to do. It's
no problem either way.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk