Subject: Re: [boost] Is there any interest in a library for actor programming? [preliminary submission]
From: Dominik Charousset (dominik.charousset_at_[hidden])
Date: 2014-05-17 11:56:59
On Sat, May 17, 2014 at 10:17:25AM +0200, Bjorn Reese wrote:
> I have a couple of major concerns with the current submission, and I am
> going to suggest some substantial changes. I hope that it does not
> discourage you too much.
> I am going to suggest that:
> 1. The library is broken into more fundamental building-blocks (which
> is what both Boost and the C++ standard is all about.)
> 2. A more flexible data flow architecture is adopted.
> 3. More use of existing Boost libraries.
> I recognize three more fundamental building-blocks in the current
> submission: active objects, messaging middleware, and data flow. I am
> not against a higher level actor API, but the fundamentals need to be
> in place first.
Thank you for taking your time for this thorough comment. However, I have to
say I disagree on many levels. First of all: C++ is not about having a low
level of abstraction. C++ is about having the highest level of abstraction
possible without sacrificing performance. What you are suggesting is to not
having an actor library in Boost. You want to have a low-level active object
library with low-level networking primitives.
> Boost.Actor implements a distributed mailbox-based actor model. While
> this is a building-block to some users, it is not fundamental. It
> conflates the actor with distribution.
An actor *is* the fundamental primitive! We are talking about an implementation
of the actor model. The whole point of the actor model is to have software
entities that abstract over physical deployment. Actors are *not* the same
thing as active objects. There is obviously overlap of these two models, but
the actor modle is more general, i.e., describes a higher level of abstraction.
> I suggest that you start with a non-distributed actor model. This is
> simply an active object with an incoming message queue. This can be
> used its own right without distribution and mailboxes. Many applications
> have classes with a working thread inside them, and active objects
> should strive to replace these classes .
A non-distributed actor model? I think what you really want to say here is: you
want users to be able to use the lightweight actor implementation and the
work-stealing scheduler without having link-dependencies to networking
infrastructure they don't use. Can you agree on that? That's a fair point and
useful indeed. Well, here's the thing: there is nothing baked into the actor
primitive that would require such a dependency. The software design is fully
modular. Separate the middleman, move publish/remote_actor to a different
namespace, ship it separately, done. All you have to do to extend boost.actor
is to provide an implementation for actor_proxy. Behind the scenes, you can do
all kinds of stuff: networking, OpenCL-binding (that's how it's done in
licppa), you name it.
> Active objects have two important variation points: scheduling and the
> queue type. Regarding scheduling there is a C++ standards proposal that
> should be considered . There is a GSoC project about this . For
> the distributed case, boost::asio::io_service also has to be considered.
I know the Executor proposal and I think it's too basic to be useful for
anything other than implementing std::async. ThreadPool implementations in
general do work-sharing, whereas boost.actor implements work-stealing. The
latter yields superior performance in almost all use cases.
I wouldn't mind moving the scheduler to it's own namespace to allow other
projects to build on top of that. The scheduler uses the interfaces resumable
and execution_unit, so there aren't any actor-specific types.
> There is also work done on message queues. We already have some in
> Boost.Lockfree, or the wait-free multi-producer queue in the
> Boost.Atomic examples, as well as sync_queue in Boost.Thread.
Show me a queue that outperforms single_reader_queue and I'll take it. Keep in
mind though, that the enqueue operation uses only *a single* compare-and-swap
operation . I don't see how you can outperform that. Did you have a look at
the performance evaluation? In particular the N:1 communication? This queue
scales up to 63 concurrent writers without a measurable perforance hit. I'm not
passionate about implementations details, though. Show me a queue that performs
even better in boost.actor and I'll take it.
> Once we have got active objects, the question is how do we connect them?
> The variation points here are routing and transmission.
Again, this library is not about active objects. It's about actors.
> The mailbox approach is too simple for many applications. Partly
> because it is too limited in some regards (e.g. push-only) and too
> flexible in other regards (e.g. you cannot have fine-grained
> access control or restricted visibility.)
> There are several flow-based approaches that should be considered:
> Boost.Iostreams has all the required concepts in place. There was a
> Boost.Dataflow GSoC project  some years ago. There is a C++ standards
> proposal  about C++ pipelines. See also the ZeroMQ guide  for
> various examples.
All of that is true and it's damn good that libcppa/boost.actor it is the way
it is. I'm sorry, but again: this is an actor library. If you want to fiddle
with low-level networking, this is not the library you are looking for. There
was a talk at this year's C++Now about libcppa and VAST. VAST is a distributed,
interactive database allowing you to do full-text search over gigabytes (that's
only what's working right now, VAST aims for scanning petabytes!) of data in
realtime, i.e., sub-second round-trip times. Matthias Vallentin gave a great
talk about his design - purely based on libcppa actors! The flow-control needed
to do the indexing in realtime (which btw is a constant stream of events) is
build *on top* of actors. Not the other way around. Matthias tried a ZeroMQ
design first, you might want to ask him how well that went... Having a low
level of abstraction does by no means implies good performance or scalability.
This is a fundamental design decision and I want people to write code on a
sane, reasonable level of abstraction. If you can't reason about your code, it
doesn't matter how "efficient" your building blocks are. The actor model is so
appealing because it *takes away* the complexity of distributed runtime
environments. And guess what? You can get insane performance out of actor
systems with less headache. If you don't believe me that actor systems scale,
go have a look at the selection of Production Users at http://akka.io/ and see
for yourself. Those companies pick Scala and Java over C++ for
*performance-critical applications* because of the actor model.
> Boost.Actor implements its own network protocol, but you often need to
> integrate with an existing protocol, such as MQTT  or DDS .
You can integrate any network protocol by using brokers:
There's an example how to integrate Google Protobuf in libcppa:
> We can add distribution by having proxies. The proxies can hide the
> details about routing (e.g. actors may change location due to load
> balancing or migration,) and network protocol.
That's exactly how it's done.
> Library reuse
> Although Boost.Actor reuses other Boost libraries, it has implemented
> quite a lot that either exists in other Boost libraries, or that could
> be moved to those.
> You have already mentioned that you do not use MPL, Serialization, and
> Asio, so I will not delve into these, other than saying that I believe
> that having your own socket implementation instead of using Boost.Asio
> is a show-stopper.
Aren't peer reviews about interface design, documentation, and testing? I
cannot believe an implementation detail can be a show-stopper. To be quite
frank, I just don't care about it. I do care about performance. As long as Asio
delivers equal or better performance, I'll migrate sooner rather than later.
But to me, this is an unimportant implementation detail.
> Apart from these three, there are other libraries that should be
> considered. Boost.Actor has:
> o Own continuations instead of Boost.Thread (future::then)
Futures don't deal with messages and know nothing about the scheduling in
boost.actor. The syntax is similar, but the continuations used in boost.actor
are syntactic sugar for the message pasing underneath.
> o Own producer-consumer queue instead of Boost.Lockfree
The producer-consumer queue used in the scheduler ist based on an excellent Dr.
Dobb's article of Herb Sutter  and performs reasonably well. Maybe there's
interest in adding it to Boost.
> o Own logging framework instead of Boost.Log, although I would
> prefer not having logging in a library at all.
As a user, you won't have logging. It's purely for debugging purposes and not
compiled unless you define the macros to do so. It really should be in the
detail namespace though.
> o Own UUID instead of Boost.UUID
Can this library give me the UUID of the first hard drive? That's the only use
case I have for this. The generators in the documentation don't mention
anything like this.
> o Own time duration instead of Boost.Chrono
The reason is the same why I don't ues std::chrono::duration: they are
templated. I need a generic duration type that has the unit as member rather
than as template parameter and also can be invalid. Maybe I could replace this
with optional<std::chrono::milliseconds> in the future, though it would mean to
hardcode the maximum resolution.
> Then there are code that could be refactored to other Boost libraries
> so they can be used in other contexts. For example:
> o Stacktrace dumper
> o RIPEMD hash function
> o MAC address
Agree, except for the stacktrace dumper. Unless someone else refactores it to
work on Windows.
I hope I could convince you that you are not requesting changes to boost.actor.
What you want is a different library entirely. An actor *is* the fundamental
building block, that's what actor programming is all about.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk