Boost logo

Boost :

Subject: Re: [boost] This AFIO review (was: Re: [afio] AFIO review postponed till Monday)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-08-23 10:23:03


On 23 Aug 2015 at 2:53, Rodrigo Madera wrote:

> > Is it lower performance than alternatives? Almost certainly, and the
> > documentation takes pains to present AFIO in a worst possible light
> > in any comparative benchmarks presented so nobody is under any
> > illusions. What matters is correctness, reliability, reusability and
> > portability - and most importantly of all - do NOT lose other
> > people's data.
> >
>
> Sorry, but I have to disagree tremendously.
>
> Performance is the only justification I can give clients (and bosses) of
> having such complex code as asynchronous programming requires. Justifying
> Boost.ASIO is an exercise in marketing, and the benchmarks are the killer
> chart that sell complexity. If not by that, Qt would be the chosen path in
> many non-academic projects.
>
> I even see a use case for AFIO in one of my products, where extreme write
> performance with minimum IO is a first priority, and a big effort was made
> to implement a subsystem for fast writes under the specific environment.
> Effort so big that the project froze until the writer code was done.

We have to be very careful of terminology here, because what I meant
by performance is not what you meant judging from what you just
wrote.

I assumed Glen was referring to "peak performance" and on that AFIO
will never be as fast as writing custom code directly using the OS
APIs. That is what I was referring to.

Your statement here though suggests you meant "sustainable
performance", so writing at the maximum sustained rate of your
hardware. That is a very different situation, and I think you will
find AFIO is very close to bare metal on that.

The reason why comes down to relative overheads. If you loop reading
one byte from the same location, then AFIO looks very slow compared
to the host OS because the overhead of ASIO and the continuations is
large compared to reading a single byte from a kernel page cache. If
however you are working with a cold cache scenario where there is any
wait on storage at all, the relative overhead of AFIO to storage is
miniscule.

The v1.3 engine has a latency of about 15 microseconds +/- 0.15
microseconds at a 95% confidence interval. You should be good to go
on any magnetic or SATA based SSD and then some. You may find battery
backed RAM drives won't be maxed out with AFIO, but as I mentioned
I'm working on it - low hanging fruit first: the number of people
using battery backed RAM drives is few, while the number of people on
SATA SSDs is many.

> The point of this is that performance should not be low priority. Specially
> for a boost library, and even more so for an asynchronous library. Citing
> correctness is not a feature to me. It's just basic principle. You are
> assumed to have it.

The importance of correctness is deeply underestimated, particularly
by Linux which historically has had incorrect file system semantics
and it is only in very recent years has there been a change in
mentality about that. ext4 remains broken, XFS however has added
extra internal locking to implement correctness.

In other words, you can't assume you always have correctness. FreeBSD
is "slower" than Linux for peak performance, but is far faster than
Linux in worst case performance. FreeBSD also has perfect
correctness. Microsoft Windows also does very well, and is also
correct.

> As soon as AFIO is really competitive for me to use, I wish to do a full
> review. As for the general API usage I'm not sure that a full blown review
> is needed for that. The library is not finished, and you said, and a review
> now will be just like our past review, where a very interesting library is
> just not ready yet. And in your case, it doesn't yet perform.

I would be very surprised if anyone finds a performance problem
outside synthetic benchmarks.

> Reviewing of incomplete library work is not ideal, IMHO.
>
> That being said, I have some questions:
>
> Benchmarks,
> Do you have usable coroutines examples now? The web sample uses futures.

The answer is no.

FYI C++ 1z coroutines are implemented using futures by default. The
only toolchain currently implementing C++ 1z coroutines is VS2015
with an extra compiler flag. And you the library end user needs to
annotate your code with the "await" keyword to switch on
coroutinisation. AFIO as a library doesn't have to do a thing except
mark up its synchronisation types with coroutinisation metadata.

> Do you have better performance numbers when using them?

I would expect performance to be lower. Stackful coroutines are not
free.

> What good measures did you employ to prevent caches from contaminating
> benchmarks?

Almost all the benchmarks refer to warm cache scenarios to paint AFIO
in the worst possible light relative to alternatives. I do have a
cold cache benchmark in the find regex in files tutorial, and as is
demonstrated you are aiming for the best tradeoff between cold cache
and warm cache performance. You never get best performance in either
extreme scenario, it's always a balanced tradeoff.

> Do you believe that performance will improve?

I know performance will improve in synthetic warm cache benchmarks. I
doubt any real world benchmarks would see a statistically measurable
difference.

> Do you know what your bottleneck is?

For the v1.3 engine it's overwhelmingly the ASIO reactor and the
hoops AFIO has to jump through to work with it.

> Why do other libraries do better? Can you imitate that while still leaving
> your superior API?

AFIO's API is a set of design tradeoffs between bare metal
performance and portability and correctness. libuv, probably its
nearest alternative, is a different set of design tradeoffs.

To decide which to choose you need to decide what "better" means and
what it is for your particular use case. I cannot give a generalised
answer. It depends on what your priorities are.

> About coroutines,
> Do you really need C++1z?

The v1.4 API was built around Gor's coroutines (the C++ 1z design)
and Oliver's forthcoming Boost.Fiber. Boost.Fiber currently only
needs C++ 14.

> Why not Boost.Coroutine emulation for backwards support?

I felt any additional real world performance gain wasn't worth it for
the significantly more brittle usage. File i/o is many orders of
magntitude slower than socket i/o. It genuinely is not important to
spend an extra 10,000 CPU cycles per 1m CPU cycle i/o operation if it
makes it easier to maintain.

> Why do you use C++11 at all? C++03 is still widespread and a requirement
> for most projects still.

AFIO was designed to take advantage of C++ 11 from the very beginning
of its
life. The single biggest assumption in its design is rvalue ref
semantics, without which the library is unusably slow due to enormous
amounts of memory copying. Lightweight futures make heavy use of C++
11 constexpr and noexcept. APIBind does not exist without template
aliasing and inline namespaces.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk