Boost logo

Boost :

Subject: Re: [boost] [afio] AFIO review postponed till Monday
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-07-21 21:27:07


On 21 Jul 2015 at 20:07, Glen Fernandes wrote:

> I hope you're not making this harder on yourself than it needs to be.
> Perhaps I need to understand better: You must have felt the library
> was functionally stable and maybe even fit for production code at some
> point (because it was in the review queue for long time), right?.

Firstly thank you for your comments.

End of summer 2013 after GSoC until start of 2015 the API barely
changed. I did rewrite the internal engine twice and kept adding more
testing, but externally all you would have seen was a lot more
performance and much fewer faults (zero eventually) in the thread
sanitiser.

> At
> some point in the last year you decided on a major update (i.e.
> involving the "lightweight future promises" you mentioned) that alter
> the interface dramatically, that would require updating the tutorial
> and documentation.

Strictly speaking, if you did a find and replace in all your files
replacing "async_io_op" with "future<>", and replacing all use of
async_io_op::get() with get_handle(), your previous source code would
now work with the "majorly updated" API.

This is the present tutorial code examples.

> Without going into too much detail about that
> change: Does it significantly make using AFIO easier?

Quite a few people have disliked (a) the choice of future
continuations over ASIO's async_result pattern and (b) the batch API.
Those two observations have come up repeatedly - Bjorn and Robert on
this list have both publicly found issue there, and neither was alone
in their opinion.

I'm not dropping futures in favour of async_result - I don't think
that helps ease of use because in file i/o you really do want strong
i/o ordering, and you usually don't care about ordering much for
network i/o. Forthcoming C++ 1z coroutines are also futures based,
and that decision is not going to be reversed now. Futures are our
future as it were.

However I could do something about the performance penalty that
futures have over async_result. I believe I have eliminated it
(untested claim) and you can emulate async_result easily in
lightweight futures by a const lvalue ref consuming continuation
(which you can add infinite numbers of). That should make those
preferring async_result happy.

That leaves the batch API. I personally quite like it, but I can see
it confuses people. So the new API presents a more traditional
unsurprising API which I think Robert amongst others should prefer.

> Does it significantly improve AFIO performance?

In no meaningful way, no. The cost of the i/o is many orders of
magnitude higher than anything I could do in AFIO. I could stick a
for loop counting to a million in there and I doubt anyone would
notice.

Where there was a performance problem was in the continuations. If
you were scheduling non-i/o continuations, the overhead of the
continuations machinery was onerous because we were vectoring through
the stable ABI layer, so everything had to be type erased and
reconstituted at least three times. Lightweight future-promise is I
believe very close to optimally lightweight now as it all happens in
the TU at the point of compilation, no ABI traversals.

> > All that said, apart from the tutorial any early observations about anything within https://boostgsoc13.github.io/boost.afio/ are welcome.
>
> Some notes from a brief first glance at the code yesterday:
>
> 1. I might be mistaken, but are you using undocumented NT APIs for the
> Windows specific implementation?

Exclusively yes. As of v1.3 I avoid Win32 entirely. There were bugs
when I tried to use both together, and since I dropped Win32
completely things are much better.

> I was under the impression that you
> wanted AFIO to be used in production code; i.e. this is intended to be
> a more practical library than an experimental one. I'm surprised the
> use of undocumented APIs has not backfired yet in your testing.

The NT kernel API is exceptionally stable as any changes to it cost
Microsoft and anyone who writes device drivers dearly. Before I use
any NT kernel API I examine when it entered NT and if it has ever
changed in any release since, with the Windows XP kernel being my
minimum supported kernel. To my knowledge, apart from a few small
easily removed places AFIO should even work perfectly on NT 4.0 - I
have been very conservative in my choices of what kernel APIs to use.

The only backfire found to date is the asynchronous directory
enumeration via the WOW64 layer where Microsoft has a bug in their
WOW64 syscall parameter repacking code, so this only affects x86
binaries on a x64 kernel. I had a buddy in Microsoft ask around about
it, and it turns out that precisely nowhere in any Microsoft code is
anyone ever doing an asynchronous directory enumeration, and hence
that bug (which is confirmed) was never noticed till now. They were
actually quite surprised that asynchronous directory enumeration
works.

It's a wontfix unfortunately, but it's very easy for me to workaround
because the segfault is caused by ASIO expecting a pointer to come
back from IOCP and it's randomly getting a truncated invalid pointer
it tries to dereference. If I don't hand off the async to ASIO I work
around the problem.

> 2. Examples are a little alarming: If an example in the documentation
> contain #ifdef WIN32 or #ifdef __FreeBSD__, it makes someone wonder
> how portable AFIO really is. Your examples should not make using the
> library look complicated by being longer than they need to be: If they
> contain #if 0 blocks, they are just that much harder to read.

This is a very thought provoking observation which has changed what
I'm going to do about the tutorial. Thank you.

Unlike some other C++ libraries, this is a platform abstraction
library rather than a C++ abstraction library the same way ASIO is.
Where it is possible without too much performance loss, I hide
platform specific quirks as ASIO does. However, where those quirks
are unavoidable, it's really best to document and push the problem
onto the end user (also same as ASIO). For example, NTFS has lazy
metadata flushing across handles, so some of the code examples you'll
see in the docs quite literally sleep for five seconds to let the
NTFS lazy flusher synchronise metadata across multiple file handles.
Another example is that FreeBSD cannot track file renames, only
directory renames, so you may see me ifdef in an otherwise
unnecessary shim directory to make the code example work properly.
Why not refactor the code example to remove file tracking? Because
the FreeBSD kernel folk are still actively considering fixing BSD, so
for all I know it could get fixed next BSD release.

I personally think that if you are a person who needs something as
niche as asynchronous file i/o, then you are *very* interested in
what platform specific quirks you'll need to know about if you're
writing high performance filing system code. Hence I chose to leave
in ifdef quirks workarounds in the tutorial and code examples, and
for those type of people with that use case those are very valuable
to know about. The same rationale applies to choice of filesystem
traversal algorithm, I think that's why I left in #if 0 alternatives
for people to experiment with themselves (I know they do, I've had
email from people who switch on the other ifdef branches and then
email me to ask why I think they are so much slower, I tell them my
best speculations, but in the end who really knows?).

All that said, it makes the tutorial much less of a tutorial and much
more of a cheatsheet on filing system quirks. I am kinda assuming the
reader is only here because they absolutely need async file i/o, and
are therefore highly skilled in that topic much as someone
approaching uBLAS probably has a maths degree.

What your comment has made me realise is that it would make much more
sense if
there were a "normal persons tutorial" and an "advanced users
tutorial" where
the former is a nice hand holding all-portable baby steps thing, and
the latter is stuff like writing distributed mutual exclusion
algorithms solely via atomic append onto the filesystem like is in
the current tutorial.

How does that plan sound?

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk