|
Boost : |
Subject: [boost] Futures vs async_result (was: Re: [afio] AFIO review postponed till Monday)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-07-24 13:47:49
On 22 Jul 2015 at 22:13, Michael Caisse wrote:
> > I don't mind restoring such a section now that I have lightweight
> > futures which do I think address most of the problems that the
> > async_result camp have with futures. Do you think I should
> > incorporate this rationale into the design rationale, the tutorial,
> > or the FAQ?
> >
>
> I personally do not think you need to restore the section. Some people
> find the async_result clumsy. Some people think it is elegant. I
> personally like the async_result interface and find it far more
> flexible; however, I can work with a future interface just fine.
>
> You have mentioned a couple times that futures is the right choice for
> file I/O and async_result is the right choice for network I/O. I haven't
> really thought too much about it but I have some ideas on why you might
> state that. I know you have thought about this problem domain a lot and
> was interested in how/why you came to that conclusion.
Oh okay. I can try my best to explain so.
I assume everyone reading understands futures. For those not aware,
ASIO's async_result works by you specialising the async_result trait
with your custom completion handler types, and then when you do
something like:
auto ret=async_write(stream, buffers, handler);
... ret will have whatever type returned by your specialisation of
async_result said. ASIO will asynchronously go off and write each of
the buffers specified to stream, calling handler once the write has
completed (with either success or failure).
This pattern works well for socket i/o because your network code
invariably does one of two patterns:
1. I am sending messages, so long as each message send is treated
atomically I don't care about the send order (i.e. sendmsg()).
2. I am sending a continuous stream, please send as much as you can
now and tell me how much you were able to send in the completion
handler.
That yields a nice simple callback loop, something like:
void handler(error_code &ec, size_t bytes)
{
if((remaining-=bytes))
async_write(socket, advance_buffers(buffers, bytes), handler);
}
async_write(socket, buffers, handler);
In other words, when you finish sending the last thing, send this
next thing.
File i/o is rather different from socket i/o because if a socket is
like a serial port guarded by a mutex, files and filesystem are like
random access memory with the same sequential consistency problems:
1. Unlike socket i/o, with file and filesystem i/o usually you are
sharing access with other threads and processes i.e. ordering of
visibility of changes to others is important.
2. I/o is never partial with file i/o, either you achieve all of a
read/write or you don't. This both a blessing and a curse - blessing
in that all of a write becomes atomically visible to others or not
(unless on a network filing system!), curse in that you must be
careful not to issue single 4Gb read/writes for example.
3. File i/o and especially the filesystem is therefore riddled with
race conditions and requires special handling just the same way as
race free use of memory requires mutexes and atomics etc. i.e. once
again, *ordering* of visibility of i/o is vitally important.
4. Also similar to RAM, recent filing systems allocate and deallocate
storage using an allocator whose semantics (pathologies!) should be
considered, else you'll end up with highly undesirable outcomes, same
as with RAM.
What this means in practice is that you don't get nice simple single
callback loops as with socket i/o. Imagine the find in files regex
algorithm:
1. Schedule enumerating directory.
2a. For each directory in enumeration, recurse into 1.
2b. For each file in enumeration, schedule opening file.
3. For each opened file, schedule reading first two 1Mb chunks.
4. For each 1Mb chunk read, stitch onto previous 1Mb chunk and do a
regex match. Schedule reading next 1Mb chunk.
That's at least four completion handlers, each of which may trigger
the others. The above is also highly simplified - a real world
solution chokes the parallelism on enumerations are those are
particularly expensive when over-parallelised, and that involves even
more callbacks.
The above is also a 100% read only algorithm. As soon as you bring
writing into the picture, you must be very careful to order your
writes and your reads perfectly correctly with no chance of racing.
Mutual exclusion on the filesystem is particularly fraught, with the
only portable method being O_EXCL and even that is known to be broken
on some major OSs until very recently.
So tl;dr the reason why futures are more appropriate for AFIO is
because it's easier to specify ordering of operations with future
continuations because they always flow in sequence from start to end.
You can do it with ASIO callback handlers too, but it requires more
typing, and I think is more error prone and harder for other
programmers to understand.
The key thing I think those who advocate async_result don't realise
is you very rarely are adjusting a single file from a single thread
if you are bothering with async file i/o. In the real world, almost
certainly multiple threads and processes are all contending on the
same shared dataset, and you need fine grained control over exactly
when you do reads and writes with respect to one another.
Hopefully this answer the question?
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk