[Capy/Corosio review] Benchmark results

older
Boost 1.92.0 is closed for major...

Marcelo Zimbres Silva

28 Jun 2026 28 Jun '26

10:01 p.m.

Hi, some months ago I started implementing new benchmarks for Boost.Redis to compare it to the current state of other popular clients. While underway with it Boost.Redis gained Corosio support, something that has been announced by Ruben Perez in this mailing list recently. In this email I would like tol share the results I obtained for appreciation by the Capy/Corosio review audience. The code is public and available at https://github.com/mzimbres/redis-cli-comp. The new benchmarks simulate a scenario where Redis is mostly used, namely, internet facing servers (usually HTTP) that serve connections concurrently while receiving server pushes e.g. pubsub events. It consists of 1. starting multiple independent sessions that issue commands in a loop. 2. subscribing to a channel to receive pubsub events. ## Runtime Performance The metric used to assess the runtime performance was the wall-clock multiplied by %CPU consumption used by the client. This metric takes into account that clients might use a different number of threads. The following result were obtained (lower is better) Client Time x %CPU (nomalized) ------------------------------------------------------------------- boost_redis_corosio 1.00 boost_redis_asio_cb 1.35 boost_redis_asio_co 1.73 redis_rs (Rust) 4.62 go_redis (Go) 22.33 The thread and file-descriptor usage by each client was Client threads fd-nr -------------------------------------------------- boost-redis-corosio 3 7 boost-redis-asio-co 2 7 boost-redis-asio-cb 2 7 redis-rs 1 10 go-redis 24 1006 ## Application build time This is the time taken to build the benchmark program excluding the time taken to build the client library Library Time (s) --------------------------------------- boost-redis-corosio 1.68 boost-redis-asio-co 3.60 boost-redis-asio-cb 8.62 redis-rs 3.10 go-redis 0.28 ## Client build time Here we see the time taken to build only the client Library Time (s) ------------------------------------------- boost-redis-lib-corosio 5.3 boost-redis-lib-asio 21.7 redis-rs 9.6 Note: I haven't found a way to measure this for go clients. ## Total build time The times below include the build time of all dependencies (which have been previously downloaded) Client Time (s) --------------------------------------- boost-redis-corosio 33.6 boost-redis-asio-co 35.5 boost-redis-asio-cb 40.5 redis-rs 42.6 go-redis 9.0 ## Executable size That is the size of release builds of each benchmark program. Executable Size -------------------------------------- boost_redis_corosio 1.5M boost_redis_asio_cb 1.9M boost_redis_asio_co 1.3M redis_rs 2.0M go_redis 8.2M ## Summary It is not the point of this report to analyse the results in detail or emit any verdict, Corosio however has come out with really impressive results IMO. Marcelo

Show replies by date

Andrey Semashev

29 Jun 29 Jun

7:08 a.m.

On 29 Jun 2026 01:01, Marcelo Zimbres Silva via Boost wrote:

...

## Runtime Performance

The thread and file-descriptor usage by each client was

Client threads fd-nr -------------------------------------------------- boost-redis-corosio 3 7 boost-redis-asio-co 2 7 boost-redis-asio-cb 2 7 redis-rs 1 10 go-redis 24 1006

I wonder why Corosio has one more thread than the ASIO variants. Does Corosio start an internal thread for some purpose? I'm assuming, the client code was equivalent for each library, or at least for the different variants of Boost.Redis. Also, worth noting that, according to the GitHub page you referenced, the number of context switches were the highest for Corosio, and the amount of system time was a bit higher than ASIO. This may be related to the additional thread being used.

Ruben Perez

2:37 p.m.

...

...
The thread and file-descriptor usage by each client was

Client threads fd-nr -------------------------------------------------- boost-redis-corosio 3 7 boost-redis-asio-co 2 7 boost-redis-asio-cb 2 7 redis-rs 1 10 go-redis 24 1006

I wonder why Corosio has one more thread than the ASIO variants. Does Corosio start an internal thread for some purpose? I'm assuming, the client code was equivalent for each library, or at least for the different variants of Boost.Redis.

Also, worth noting that, according to the GitHub page you referenced, the number of context switches were the highest for Corosio, and the amount of system time was a bit higher than ASIO. This may be related to the additional thread being used.

The extra thread is allocated by the timer service in Capy, underlying capy::delay() and capy::timeout(). Capy doesn't know about Corosio, so I don't think it can use io_context threads to run timers.

Vinnie Falco

2:41 p.m.

On Mon, Jun 29, 2026 at 7:40 AM Ruben Perez via Boost <boost@lists.boost.org> wrote:

...

The extra thread is allocated by the timer service in Capy, underlying capy::delay() and capy::timeout().

Huh? How can Capy implement delay() and timeout() without a reactor? I don't think those functions belong in Capy. Thanks

Ruben Perez

2:49 p.m.

On Mon, 29 Jun 2026 at 16:41, Vinnie Falco <vinnie.falco@gmail.com> wrote:

...

On Mon, Jun 29, 2026 at 7:40 AM Ruben Perez via Boost <boost@lists.boost.org> wrote:

...
The extra thread is allocated by the timer service in Capy, underlying capy::delay() and capy::timeout().

Huh? How can Capy implement delay() and timeout() without a reactor? I don't think those functions belong in Capy.

Thanks

It looks like it can. These functions seem to have been there for a while: https://master.capy.cpp.al/capy/reference/boost/capy/timeout.html https://master.capy.cpp.al/capy/reference/boost/capy/delay.html I asked myself the same question the first time that I saw them :) Looking into the code, Capy spawns a thread and waits on a std::condition_variable to implement this: https://github.com/cppalliance/capy/blob/develop/src/ex/detail/timer_service... I do agree it's odd, especially considering that Corosio has timers, and cancel_at/cancal_after members. I find the ergonomics of delay() and timeout() much better than those of timers, but that's a different concern to what's being discussed here. This makes me wonder if it is safe to use delay() and timeout() with an io_context with a concurrency_hit of one, or not.

Vinnie Falco

2:56 p.m.

On Mon, Jun 29, 2026 at 7:50 AM Ruben Perez <rubenperez038@gmail.com> wrote:

...

It looks like it can. These functions seem to have been there for a while:

https://master.capy.cpp.al/capy/reference/boost/capy/timeout.html https://master.capy.cpp.al/capy/reference/boost/capy/delay.html

Hmm... no, I don't think this is a good idea at all.

...

I find the ergonomics of delay() and timeout() much better than those of timers, but that's a different concern to what's being discussed here.

Well, of course the ergonomics are better. Because the Capy timer operations hide a memory allocation through use of std::function: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... Corosio makes this explicit by requiring the user to manage the timer object's lifetime.

...

This makes me wonder if it is safe to use delay() and timeout() with an io_context with a concurrency_hit of one, or not.

I think these two functions and the timer service should be removed as a condition of acceptance. They can be moved to the examples. Thanks

Andrey Semashev

3:08 p.m.

On 29 Jun 2026 17:49, Ruben Perez via Boost wrote:

...

On Mon, 29 Jun 2026 at 16:41, Vinnie Falco <vinnie.falco@gmail.com> wrote:

...
On Mon, Jun 29, 2026 at 7:40 AM Ruben Perez via Boost <boost@lists.boost.org> wrote:

...
The extra thread is allocated by the timer service in Capy, underlying capy::delay() and capy::timeout().

Huh? How can Capy implement delay() and timeout() without a reactor? I don't think those functions belong in Capy.

Thanks

It looks like it can. These functions seem to have been there for a while:

https://master.capy.cpp.al/capy/reference/boost/capy/timeout.html https://master.capy.cpp.al/capy/reference/boost/capy/delay.html

I asked myself the same question the first time that I saw them :)

Looking into the code, Capy spawns a thread and waits on a std::condition_variable to implement this: https://github.com/cppalliance/capy/blob/develop/src/ex/detail/timer_service...

I do agree it's odd, especially considering that Corosio has timers, and cancel_at/cancal_after members.

I find the ergonomics of delay() and timeout() much better than those of timers, but that's a different concern to what's being discussed here.

This makes me wonder if it is safe to use delay() and timeout() with an io_context with a concurrency_hit of one, or not.

My general preference is that it is best to avoid spawning internal threads and instead design API in such a way that the user provides a thread, if one is needed. This gives the user more control over resource management and allows for custom thread initialization, which may be necessary if user's code is supposed to be run in that thread. Think of stuff like thread custom stack size or CoInitialize(). I'm not sure which library timeouts and delays belong to, but I do agree that these features should be based on IO reactor loop. IMHO, if Capy has to provide those features while it doesn't provide IO reactors, it should accept an externally-provided reactor to implement those features.

Vinnie Falco

3:12 p.m.

On Mon, Jun 29, 2026 at 8:10 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:

...

My general preference is that it is best to avoid spawning internal threads and instead design API in such a way that the user provides a thread, if one is needed.

We try to do that but sometimes the internal thread cannot be avoided. For example domain name resolutions are inherently synchronous. User code doesn't run in the implementation-defined thread which Corosio launches for this. Thanks

Andrey Semashev

3:33 p.m.

On 29 Jun 2026 18:12, Vinnie Falco wrote:

...

On Mon, Jun 29, 2026 at 8:10 AM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:

My general preference is that it is best to avoid spawning internal threads and instead design API in such a way that the user provides a thread, if one is needed.

We try to do that but sometimes the internal thread cannot be avoided. For example domain name resolutions are inherently synchronous. User code doesn't run in the implementation-defined thread which Corosio launches for this.

Strictly speaking, asynchronous DNS resolvers do exist (e.g. c-ares). But I understand that they may not be available on a given system, and an implementation with an extra thread is needed as a fallback. In this case, I would still prefer an option for a user to provide his own thread for the resolver.

Rainer Deyke

30 Jun 30 Jun

6:44 a.m.

On 6/29/26 16:41, Vinnie Falco via Boost wrote:

...

On Mon, Jun 29, 2026 at 7:40 AM Ruben Perez via Boost <boost@lists.boost.org> wrote:

...
The extra thread is allocated by the timer service in Capy, underlying capy::delay() and capy::timeout().

Huh? How can Capy implement delay() and timeout() without a reactor? I don't think those functions belong in Capy.

Removing them would drastically reduce the usefulness of Capy-without-Corosio, and timing is semantically orthogonal to sockets. Capy's documentation used to be full of calls to std::this_thread::sleep_for, which is not a thread-pool-safe operation. (It still is, technically, but I think all of the remaining calls are either simulated work or in non-coroutine contexts, so they're all right.) -- Rainer Deyke - rainerd@eldwood.com

Ruben Perez

29 Jun 29 Jun

10:54 a.m.

On Mon, 29 Jun 2026 at 00:02, Marcelo Zimbres Silva via Boost <boost@lists.boost.org> wrote:

...

Hi, some months ago I started implementing new benchmarks for Boost.Redis to compare it to the current state of other popular clients. While underway with it Boost.Redis gained Corosio support, something that has been announced by Ruben Perez in this mailing list recently.

Have you considered replicating your experiments on the cost of async abstractions [1], but for Capy/Corosio? I'd be really interested in the cost that tasks that complete immediately have. Note that I'm talking about tasks, rather than awaitables - so something like: capy::io_task<> queue_push(int value) { if (!full()) { container.push_back(value); co_return {}; } // wait for space } The reason why I'm asking this is because this is the kind of code that the "Application developers" user tier [2] has the ability to write, and hence the most abundant. Thanks, Ruben. [1] https://github.com/boostorg/redis/blob/develop/doc/on-the-costs-of-async-abs... [2] https://isocpp.org/files/papers/P4172R1.pdf

Marcelo Zimbres Silva

2 Jul 2 Jul

10:09 p.m.

On Mon, 29 Jun 2026 at 12:54, Ruben Perez <rubenperez038@gmail.com> wrote:

...

Have you considered replicating your experiments on the cost of async abstractions [1], but for Capy/Corosio?

I haven't had the time unfortunately but I am also interested in the outcome. As a protocol library Capy seems to offer appropriate ground for e.g. sans-io (net) protocol implementations, however, the cost of doing so is unclear to me. A single read from the socket can contain hundreds or perhaps thousands of messages in a client/server Redis setup. That means an (async) coroutine based parser of that protocol would be plagued with immediate completions if it has to suspend to communicate new messages (yield), this is not particularly costly because it can be done without a trip to the event loop (symmetrical transfer). I guess however there is still a cost, worse yet, with the maximum number of inline calls set to 16 by default [1] the rescheduling cost might be prohibitively high (as I discussed in the paper you linked). This is obviously not Capy/Corosio fault but it raises questions about how wide is the usefulness as a protocol library. I am also interested in hearing from boost.http authors whether any of this has been taken into account. [1] https://develop.corosio.cpp.al/corosio/4.guide/4c2.configuration.html inline_budget_max 16

Christian Mazakas

29 Jun 29 Jun

3:46 p.m.

On Sun, Jun 28, 2026 at 3:01 PM Marcelo Zimbres Silva via Boost < boost@lists.boost.org> wrote:

...

Hi, some months ago I started implementing new benchmarks for Boost.Redis to compare it to the current state of other popular clients. While underway with it Boost.Redis gained Corosio support, something that has been announced by Ruben Perez in this mailing list recently.

In this email I would like tol share the results I obtained for appreciation by the Capy/Corosio review audience. The code is public and available at https://github.com/mzimbres/redis-cli-comp.

The new benchmarks simulate a scenario where Redis is mostly used, namely, internet facing servers (usually HTTP) that serve connections concurrently while receiving server pushes e.g. pubsub events. It consists of

1. starting multiple independent sessions that issue commands in a loop. 2. subscribing to a channel to receive pubsub events.

## Runtime Performance

The metric used to assess the runtime performance was the wall-clock multiplied by %CPU consumption used by the client. This metric takes into account that clients might use a different number of threads. The following result were obtained (lower is better)

Client Time x %CPU (nomalized) ------------------------------------------------------------------- boost_redis_corosio 1.00 boost_redis_asio_cb 1.35 boost_redis_asio_co 1.73 redis_rs (Rust) 4.62 go_redis (Go) 22.33

These results seem a little suspicious to me. Rust is 4.62x slower? C, C++ and Rust are relatively the same perf-wise, because they all just compile to LLVM IR and get turned into a final executable by clang. I've pointed this out before but the version of Tokio being tested is ~20 versions behind. I think you should publish the flamegraphs here, because there's something wrong if a language that compiles to llvm ir is over 4.5x slower, and this is worthy of a little skepticism. Being 20x faster than Go also sounds kind of suspicious, if I'm being honest. redis-rs sits at around 80 million total downloads ( https://crates.io/crates/redis) and has numerous contributors. Something is going on here, and only the flamegraphs can really tell us. It could very well be that the crate isn't well-implemented, of course. For the record, you can see the latest stable version of a crate here: https://crates.io/crates/tokio - Christian

Christian Mazakas

3:52 p.m.

On Mon, Jun 29, 2026 at 8:46 AM Christian Mazakas < christian.mazakas@gmail.com> wrote:

...

Something is going on here, and only the flamegraphs can really tell us.

I forgot to mention, there's a really nice flamegraph tool here: https://github.com/flamegraph-rs/flamegraph#examples It's a cargo subcommand but it can be used to generate a flamegraph for an arbitrary executable and it's what I use for making them. This is basically just a glorified wrapper around perf, but I could never really be bothered to properly learn perf, ha ha. - Christian

Marcelo Zimbres Silva

9:16 p.m.

On Mon, 29 Jun 2026 at 17:47, Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...

These results seem a little suspicious to me. Rust is 4.62x slower? C, C++ and Rust are relatively the same perf-wise, because they all just compile to LLVM IR and get turned into a final executable by clang.

I've pointed this out before but the version of Tokio being tested is ~20 versions behind.

I have run the benchmarks again, the raw data follows below. Nothing seems to have changed.

...

I think you should publish the flamegraphs here, because there's something wrong if a language that compiles to llvm ir is over 4.5x slower, and this is worthy of a little skepticism.

Being 20x faster than Go also sounds kind of suspicious, if I'm being honest.

redis-rs sits at around 80 million total downloads ( https://crates.io/crates/redis) and has numerous contributors.

Something is going on here, and only the flamegraphs can really tell us.

Profiling clients to find out what are their performance bottlenecks is far beyond scope. I am mostly concerned with making sure my implementation is correct and uses the client idiomatically. I invite you to review it and let me know of any improvement I have missed or whether anything is wrong. Now to the results App time(s) ------------------------------------- boost_redis_asio_co 16.61 boost_redis_asio_cb 18.36 boost_redis_corosio 18.30 redis_rs 30.11 go_redis 73.16 Info: Measure with /usr/bin/time App %usr %system %wait %CPU ------------------------------------------------------------------------------ boost_redis_asio_co 59.91 2.59 0.00 62.49 boost_redis_asio_cb 45.14 2.61 0.00 47.75 boost_redis_corosio 31.33 5.67 0.00 37.00 redis_rs 98.21 1.65 0.00 99.86 go_redis 172.37 104.00 0.15 276.37 Info: pidstat -u App threads fd-nr ----------------------------------------------- boost_redis_asio_co 2 7 boost_redis_asio_cb 2 7 boost_redis_corosio 3 7 redis_rs 1 10 go_redis 24 1004 Info: Measured with pidstat -v App cswch/s nvcswch/s ----------------------------------------------------------- boost_redis_asio_co 2728.00 1.28 boost_redis_asio_cb 3086.28 2.78 boost_redis_corosio 3292.11 0.39 redis_rs 22.55 17.87 go_redis 3358.07 9.33 Info: Measured with pidstat -w App minflt/s VSZ RSS %MEM -------------------------------------------------------------------------------- boost_redis_asio_co 65.89 92320 12796 0.04 boost_redis_asio_cb 50.94 91520 11752 0.04 boost_redis_corosio 85.78 101535 9678 0.03 redis_rs 27.94 7945 6488 0.02 go_redis 58.56 2955086 285233 0.88 Info: Measured with pidstat -r App cache-misses branch-misses ------------------------------------------------------------------- boost_redis_asio_co 2,582,352 95,994,676 boost_redis_asio_cb 1,275,913 55,806,958 boost_redis_corosio 2,104,264 87,184,953 redis_rs 3,697,919 321,341,140 go_redis 1,225,923,065 1,677,269,847 Info: perf stat -B -e branch-misses,cache-misses Marcelo

Marcelo Zimbres Silva

9:27 p.m.

On Mon, 29 Jun 2026 at 23:16, Marcelo Zimbres Silva <mzimbres@gmail.com> wrote:

...

I have run the benchmarks again, the raw data follows below. Nothing seems to have changed.

I forgot to say tokio was updated to 1.52. Marcelo

Rainer Deyke

30 Jun 30 Jun

6:49 a.m.

On 6/29/26 17:46, Christian Mazakas via Boost wrote:

...

These results seem a little suspicious to me. Rust is 4.62x slower?

Doesn't surprise me. Rust has always been willing to pay the cost of runtime safety rails where compile-time checks are inadequate. -- Rainer Deyke - rainerd@eldwood.com

Christian Mazakas

4:28 p.m.

On Mon, Jun 29, 2026 at 11:50 PM Rainer Deyke via Boost < boost@lists.boost.org> wrote:

...

On 6/29/26 17:46, Christian Mazakas via Boost wrote:

...
These results seem a little suspicious to me. Rust is 4.62x slower?

Doesn't surprise me. Rust has always been willing to pay the cost of runtime safety rails where compile-time checks are inadequate.

It actually should surprise you. Personally, I think it's a little unwise to attempt to publish numbers without actually understanding the benchmarks in question. It's poor form to say profiling is out of scope but still publish numbers anyway. - Christian

Marcelo Zimbres Silva

9:52 p.m.

On Tue, 30 Jun 2026 at 18:29, Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...

Personally, I think it's a little unwise to attempt to publish numbers without actually understanding the benchmarks in question.

I implemented the benchmark for each client myself so I guess I understand them.

...

It's poor form to say profiling is out of scope but still publish numbers anyway.

Perhaps you skipped the raw data from my previous email? You have been provided with multiple profiling information. - pidstat -u: Report CPU utilization. - pidstat -v: Threads, file descriptors. - pidstat -w: Context switches. - pidstat -r: Page faults and memory utilization. - perf stat -B: Cache and branch misses. The flamegraphs for clients other than Boost.Redis is obviously useless information to me as I am not familiar with their source code. Perhaps you should do that yourself and PR them with a fix?

Christian Mazakas

1 Jul 1 Jul

3:01 p.m.

On Tue, Jun 30, 2026 at 2:52 PM Marcelo Zimbres Silva <mzimbres@gmail.com> wrote:

...

On Tue, 30 Jun 2026 at 18:29, Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...
Personally, I think it's a little unwise to attempt to publish numbers without actually understanding the benchmarks in question.

I implemented the benchmark for each client myself so I guess I understand them.

Alright, so why is Rust 4.5x slower and why is Go 20x slower?

...

...
It's poor form to say profiling is out of scope but still publish numbers anyway.

Perhaps you skipped the raw data from my previous email? You have been provided with multiple profiling information.

- pidstat -u: Report CPU utilization. - pidstat -v: Threads, file descriptors. - pidstat -w: Context switches. - pidstat -r: Page faults and memory utilization. - perf stat -B: Cache and branch misses.

The flamegraphs for clients other than Boost.Redis is obviously useless information to me as I am not familiar with their source code. Perhaps you should do that yourself and PR them with a fix?

Yeah, this is kind of my point. You have numbers but you have no idea why. You have no idea if you're even using the libraries effectively or at least if you do, you haven't explained it. I'm just gonna cut to the chase and say that posting the other langs has political motivations to show that "C++ always wins". Which is fine. Maybe the crate is of poor quality. Maybe the GC in Go really is that bad. I don't know, and I'm starting to suspect you don't know either. This is why I say it's poor form to include data you don't fully understand, because you have no idea if it's even correct. Posting data without understanding isn't an improvement.

...

Perhaps you should do that yourself and PR them with a fix?

Replying occasionally to the ML is about the maximum budget I can afford, ha ha. It takes infinite time and energy to refute claims, which is why it's usually on the person making the claims to actually do the work and be like, "Yes, Rust is almost 5x slower because its approach is X, Y and Z". - Christian

Marcelo Zimbres Silva

2 Jul 2 Jul

9:18 p.m.

On Wed, 1 Jul 2026 at 17:03, Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...

You have no idea if you're even using the libraries effectively or at least if you do, you haven't explained it.

You are in denial but whatever. I asked redis-rs maintainers to let me know if my implementation look correctly https://github.com/redis-rs/redis-rs/issues/2190#issuecomment-4865735126 Marcelo

Christian Mazakas

4 Jul 4 Jul

7:01 p.m.

On Thu, Jul 2, 2026 at 2:18 PM Marcelo Zimbres Silva <mzimbres@gmail.com> wrote:

...

On Wed, 1 Jul 2026 at 17:03, Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...
You have no idea if you're even using the libraries effectively or at

least

...
if you do, you haven't explained it.

You are in denial but whatever. I asked redis-rs maintainers to let me know if my implementation look correctly

https://github.com/redis-rs/redis-rs/issues/2190#issuecomment-4865735126

Marcelo

It's not denial, it's just mild skepticism. For what it's worth, it's always wise to verify these things when the runtime discrepancies are so large. I don't expect many on this mailing list to know this but Rust and C++ and C are all basically the same language as far as runtime performance is concerned. So what you wind up actually benchmarking is either the quality of implementation or some erroneous/non-ideal usage of the implementation itself. In general, when you run benchmarks between you and a competitor and you find yourself around 4.5x faster, your instinct should be to immediately verify with either the authors or you do some light profiling yourself to see where all the time is spent. I think what happened was, you weren't expecting anyone to be skeptical of the results and felt personally insulted that I'd question them, but this is just standard engineering rigor and should be the standard operating procedure not just for Boost but all engineering you'll do in your career. I'm sorry I had to be so firm but if we want Boost to be a legitimate project worthy of its reputation, doing basic benchmark verification like this is a standard. - Christian

Age (days ago)

Last active (days ago)

List overview

21 comments

6 participants

participants (6)

Andrey Semashev
Christian Mazakas
Marcelo Zimbres Silva
Rainer Deyke
Ruben Perez
Vinnie Falco

[Capy/Corosio review] Benchmark results

tags

participants (6)