Boost logo

Boost :

Subject: Re: [boost] [afio] AFIO review postponed till Monday
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-07-22 13:36:54


On 22 Jul 2015 at 7:58, glenfernandes wrote:

> > Really with async file i/o it's always more a question of control rather
> > than performance. In the naïve case performance will usually be higher by
> > going synchronous.
>
> I might be misunderstanding you, so I will try harder to get on the same
> page. Even though English is the only language I know, I know it poorly. :-)

No problem.

> Why would anyone interested in reviewing AFIO care about getting more
> performance by going synchronous? The reason they're interested in an
> asynchronous file I/O library is because they need asynchronous file I/O,
> right?

Most people who think they need async file i/o don't need it and
shouldn't use it.

Conversely, most people who write code which uses a filesystem path
don't realise they've just written buggy racy code which could
destroy other people's data and open security holes.

> Control over performance sounds great, but it's not control over performance
> if it comes at a cost of performance, right? [Example: I see the support of
> the C++ allocator model as something which can sometimes offer control over
> performance, but in no way does it make things any slower at runtime when
> the allocator supplied is std::allocator than if the code just used 'new'
> and 'delete'. --end example]

You've got it exactly right: you sacrifice some average performance
in exchange for control over worst case performance.

Same goes for the race free filesystem code - they come with a
performance cost due to POSIX not providing a race free sibling file
open API, so AFIO must iterate parent directory opens with inode
lookups and back off and retry until success. Windows doesn't have
this problem. It's been raised with the AWG, and there was some
sympathy for the omission.

> I thought motivation to use your library would be one or more of:
> - Simplicity (makes it easier to write maintainable file I/O code)
> - Portability (saves me time from writing platform specific code)
> - Performance (it is faster than code I would write by hand)

I would say all three of these yes.

> On simplicity: If someone does not care about portability, can they write
> smaller/cleaner/more-maintainable code if they choose to use AFIO versus
> using overlapped I/O with IOCPs or KAIO? Does it sacrifice any simplicity
> for portability?

If you didn't care about portability, if you write your code in WinRT
then all your i/o is async and that is probably the nicest way of
writing 100% async i/o code using a mainstream language that I am
aware of.

Once C++ 1z coroutines are in there, I believe AFIO will let you come
close to that clarity and simplicity of coding async as on WinRT,
except it's portable.

My long term goal here is that C++ becomes like Erlang and your
apparently synchronous C++ code magically coroutinises at any point
it could block because under the bonnet it's using ASIO for
networking and AFIO for file and filesystem, so whenever your legacy
C++ codebase "blocks" it is actually off executing other stuff and
it'll correctly resume when the i/o completes. That's a long way away
though.

> On performance: Is it faster or at least no slower than any other libraries?
> (e.g. libuv) Does it sacrifice any performance for portability?

I haven't compared it to libuv, but libuv does nothing about
filesystem races and ought therefore to be quicker. That said, Rust
started out with libuv as its i/o layer and they ended up recently
dropping it due to poor i/o performance which is why Rust's i/o
library is so immature relative to its other standard libraries.

Note you can ask AFIO to disable the race freedom code for a specific
case, and it does.

> On portability: Does it entirely abstract away any platform specific issues?

As much as it can.

> (e.g. Do you believe a user of AFIO will be required to write
> platform-specific code as in your examples?)

I think for any serious use of the filesystem some platform specific
code is inevitable. For example, concurrent atomic appends to a file
are extremely quick on NTFS and ext4 but exceptionally slow on ZFS.
Conversely, fsyncing or especially O_DIRECT on ZFS is lightening
quick compared to NTFS or ext4. I can't abstract out those sorts of
differences because I can't know if they're important to the end user
or not.

In a future AFIO I'll provide a high level abstracted API for locking
ranges in many files at once, and you won't need to care how that is
implemented under the bonnet where it'll use very different solutions
depending on your situation. If you look at
https://boostgsoc13.github.io/boost.afio/doc/html/afio/quickstart/atom
ic_logging.html you'll see lots of benchmarks for various multi-file
locking solutions on many platforms, this was me testing the waters
for a future high level solution.

I'm not against similar high level abstractions like that in the
future, but I suspect I won't be writing any I won't be using myself
personally. This stuff is very hard to write, much harder than
acquire-release atomics for memory race freedom which I used to once
think were hard and tricky. They aren't relative to filesystem based
algorithms.

> > Now with the race free filesystem extensions [...] things have changed.
> > If you were wanting to write portable code capable of working under a
> > changing filesystem you get no choice but AFIO right now in any language
> > (that I am aware of).
>
> Does the documentation show (with examples) how AFIO helps here?

A good point. It's why I submitted that topic to CppCon.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk