[boost] Re: Capy Review, take 2

29 Jun 2026

      On Mon, Jun 29, 2026 at 3:25 AM Rainer Deyke via Boost <
boost@lists.boost.org> wrote:
...
This is my second formal review of Capy, meant to replace the original,
which I am hereby formally withdrawing.
Thank you for refocusing on the core concepts, Rainer. You're right that
the core of Capy is the protocol for coroutine environment propagation, and
that's the right lens for evaluation. I'll respond to each section.
...
First off, the name is wrong. The concept has nothing to do with
i/o. I understand the history of the name, that the concept was
written first and foremost to support the corosio library as a
replacement for ASIO, both of which have io in the name, but it's
not a descriptive name. Renaming it is going to be disruptive,
which is all the more reason to rename it *now* instead of waiting
for the rename to be forced when library is standardized.
Yes and there's a little more to it. The name is political. "I/O" is an
unclaimed domain in the standards committee and one where we have evidence
that it is the most natural fit for coroutines. I do recognize that the
name is imperfect. However, and I will refer back to P4172, there are the
three audiences:

1. Application authors
2. Framework authors
3. I/O library authors

Only groups 2 and 3 are exposed to the concept name, and together this
cohort is vastly smaller. This cohort is also more skilled and specialized.
The name is in theory less consequential. Regardless, we are considering
"Cowaitable" yet of course that has its own problems. A rename would be
ideally timed before any initial Boost release in which the library appears
(assuming it is accepted).
...
IoAwaitable exists to support the guarantee that every task runs
on its executor. This guarantee is useful for making it easier to
reason about code and for writing correct code. It also comes at
a considerable runtime cost. Coroutines get bounced around between
IoAwaitables and executors like a game of ping-pong.
The "bouncing" framing doesn't match how the library is used in practice.
In the primary use case, (networking) you launch a coroutine chain on an
executor (a strand, an io_context) and the entire chain stays there.
No executor
hopping. The executor affinity is the point: you set it once and stop
thinking about it.

The cost you describe only appears when you explicitly choose to switch
executors via `run`. That's a deliberate action, not something that happens
behind your back. The question I'd ask is: why would you be switching
executors frequently enough for the overhead to matter? In I/O code,
executor switches are rare. You launch on a strand or an io_context, and
you stay there.
...
It's not *wrong*, per se, to do it this way. It's a valid approach.
The benefits are worth the costs in many cases. But it's not the
only valid approach. And it seems like a shame that a supposedly
universal protocol like IoAwaitable forces these compromises on
users without escape hatches. So I would like to propose two escape
hatches that don't allow IoAwaitables to be bypassed, but work with
them.
I appreciate the thoughtfulness to approaches. I'll address each escape
hatch.
...
The first is resume_on. I realize that this has already rejected by
the Capy developers in favor of capy::run, but each capy::run call
requires an extra coroutine frame and an extra executor switch after
the inner coroutine co_returns. resume_on also allows the code
running on the alternate executor to directly use co_return for the
main coroutine, which provides better ergonomics.
The extra coroutine frame exists because `run` creates a new `io_env`
for the child: new executor, inherited stop token and allocator. The
trampoline ensures you return to your original executor when the child
completes. This is the price of correctness.

There are three problems with `resume_on`:

1. the library provides the tools for anyone to implement `resume_on`
themselves. The executor concept, `continuation`, `dispatch`, and `post`
are all public. Nothing prevents a user from building `resume_on` as a
standalone awaitable. We don't have to ship it to enable it.

2. `resume_on` breaks the environment model. When you `resume_on(ex)` and
then `co_await` an IoAwaitable, the `io_env` still points to the *original*
executor. The IoAwaitable dispatches its completion to the wrong executor.
To fix this, you either need to update the `io_env` (but it's `const*` and
shared - who owns the new one?) or allocate a new `io_env` (which is
exactly what `run` already does). So `resume_on` either breaks the
environment or duplicates the machinery of `run`, minus the safety of
scoped lifetime.

3. coroutines are both easy to use and hard to use. They're more ergonomic
than callbacks, but the C++ committee standardized the machinery in 2020
and then spent six years building P2300 senders instead of providing
library components. The consequence is that the ecosystem has minimal
collective experience with coroutines. Providing `resume_on` as a
first-class API in a Boost library would invite misuse from users who are
still learning the fundamentals. It's an expert feature disguised as a
convenience, with significant sharp edges. There's already enough sharp
edges in Capy.
...
One caveat about resume_on: its effect should be limited to the
coroutine in which it is used. When coroutine A co_awaits coroutine
B, and coroutine uses resume_on and then co_returns, execution on
coroutine A should always resume on A's original executor, not the
executor that B switched to.
This caveat is exactly what makes `resume_on` as complex as `run`. When
coroutine B calls `resume_on(ex2)` and then `co_return`s, coroutine A must
resume on A's original executor. That means `final_suspend` must know which
executor the parent was on and dispatch back to it. That is exactly what
the trampoline in `run` does. The "simpler" `resume_on` requires the same
machinery as `run` to be correct. The complexity is merely hidden not
removed.

Implementing this scoping correctly is enormously complex. I'm not even
certain it's feasible without the trampoline mechanism that `run` already
provides.

`run` IS `resume_on`, implemented correctly, with scoped lifetime and automatic
return-to-caller. The extra coroutine frame is the price of correctness.

The existing design rationale is documented in the "Capy and TooManyCooks"
comparison (Section 8 of the documentation). We'll expand that document to
address `resume_on` specifically and explain why `run` is the correct
implementation of the same idea.
...
The second escape hatch is immediate_executor. It looks something
like this:
class immediate_executor {
   public:
     std::coroutine_handle<> ce.dispatch(capy::continuation &c) {
       // Obey the letter of the law by not just returning c.h...
       this->post(c);
       return {};
     }
void post(capy::continuation &c) {
       // ...but violate the spirit of the law by calling h.resume().
       c.h.resume();
     }
     // ...other functions here...
   };
How do you propose not overflowing the stack?

Your `post` calls `c.h.resume()` directly. That resumes a coroutine from
inside another coroutine's execution context. Each `resume()` adds a stack
frame.

This is the exact problem symmetric transfer was invented to solve.
`await_suspend`
returns a `coroutine_handle<>` so the runtime can tail-call it without
growing the stack. The `dispatch` path can inline via symmetric transfer
(returning `c.h` from `await_suspend`), but `post` cannot - it's called
from contexts where there is no `await_suspend` return value to tail-call
through. An immediate_executor that calls `resume()` directly from `post`
defeats the entire mechanism.

Beyond stack overflow, there are further constraints. `async_mutex` stop
callbacks call `executor.post(cont_)` from arbitrary threads. An
immediate_executor
that inlines `post` would resume the coroutine on the wrong thread,
corrupting the thread-local frame allocator. This is, ironically, exactly
the class of bug you asked us to document in condition 4.
...
You might say that it's not Capy's business to educate the users on
basic coroutine safety. I say that Capy made it its business when
it included a basic coroutine primer in its documentation.
We wouldn't say that, and I'm glad you raised it. Educating users is not
just Capy's business, it's our obligation. This follows the tradition I
started with Beast, which included extensive conceptual documentation
precisely because the domain was new to most C++ developers.

Coroutines are in the same position today - arguably worse. The committee
standardized coroutine machinery in C++20 and then provided no library
components for six years. This wasn't an accident. P2300 section 1.9.2
dismissed coroutines as a basis for asynchrony with five paragraphs of text
and no supporting measurements. "Symmetric transfer" - the mechanism that
makes coroutines safe and efficient - does not appear once in eleven
revisions of that paper. The consequence: no standard coroutine library
components, no educational infrastructure, no ecosystem experience. That
gap is why reviewers ask about resume_on and immediate_executor - not
because the ideas are wrong, but because the fundamentals haven't been
taught. Users don't just lack experience with Capy; they lack experience
with coroutines in general. Where else will they go? We are the experts.
Who else but us should write the tutorial? Who else could? Capy is filling
a hole the committee dug.

Many of the concerns raised in this review: executor switching, immediate
resumption, runtime cost - trace back to that knowledge gap. We need to
explain the *why*, not just the *how*. The primer should cover pitfalls
beyond the obvious:

- Dangling references in lambda captures
- Constructing a task without immediately co_awaiting it - the task captures
references to the caller's stack frame at construction time. Store it and
co_await later, those references dangle. You don't need a lambda for this
to bite you.
- Why `std::mutex` inside a coroutine running on a thread pool is a deadlock
waiting to happen
- Why you must never call `h.resume()` from a stop_callback - post through
the executor instead
- Why symmetric transfer exists - calling `resume()` directly grows the
stack, which is why executors queue work instead of resuming inline

Your original review is excellent source material for gotchas encountered
by a knowledgeable reviewer. We'll reference it.
...
When ASIO introduced strands, it was a revolutionary alternative to
std::mutex. But the alternative to capy::strand is not std::mutex
but capy::async_mutex, and they are more alike than they are
different. [...] Can they be merged? Should I prefer one over the
other? Can some text be added to the documentation to help me
choose between them?
They serve different granularity and should not be merged. We'll add
a comparison section to the documentation. The key distinctions:

A strand is coarse-grained. All work dispatched through the strand serializes.
Use it when an entire coroutine tree needs exclusive access - the typical
case is per-connection state in a server. You launch the connection handler
on a strand and everything downstream serializes automatically.

An async_mutex is fine-grained. Coroutines run concurrently and only serialize
at lock/unlock points. Use it when most of your coroutine's work can run in
parallel and only specific shared variables need protection.

A strand has no failure mode for acquisition - you're either on it or not.
An async_mutex returns `io_result<>` because lock acquisition can be
canceled via stop token.

A strand supports direct posting (`strand.post(c)`) without `co_await`. An
async_mutex always requires `co_await` because it may need to suspend.

They are often used together. The async_mutex example in the documentation
runs on a strand to satisfy the mutex's single-executor threading
requirement.
...
Apparently locking can fail for mutexes but not for strands?
Or do strands just report failure differently?
Locking an async_mutex can fail because lock acquisition is cancellable. If
the coroutine's stop token fires while it's waiting in the queue, the lock
attempt completes with `error::canceled`. This is a feature, not an
asymmetry - a strand doesn't "lock" in the same sense. You dispatch work to
it, and it runs when the strand is available. There's no queue-entry
cancellation because there's no explicit acquisition step.
...
1. The documentation of IoAwaitable is still broken because it does
not use capy::continuation. I am told that this will be fixed. It
hasn't been fixed yet. It needs to fixed before Capy is accepted
into Boost.
Agreed. The documentation example uses `std::coroutine_handle<>` where it
should use `continuation`. The header comment in `io_awaitable.hpp` has the
correct example. We will synchronize the doc with the header.
...
Note that every single one of these can potentially be fixed by
just changing the documentation. I haven't tested the
implementation, but I assume it's fine.
To summarize:

1. IoAwaitable documentation: will fix.
2. resume_on - `run` is resume_on, implemented correctly. The extra frame
is the price of scoped lifetime and automatic return-to- caller. We'll
expand the design rationale documentation.
3. immediate_executor - stack overflow, wrong-thread resumption, and frame
allocator corruption prevent a safe general implementation. We'll document
the constraints for expert users.
4. Coroutine safety documentation: yes! Educating users is our
responsibility and we'll expand the primer substantially.
5. Strand vs async_mutex - accepted. We'll add comparison documentation
with usage guidelines.

Vinnie

[boost] Re: Capy Review, take 2

Vinnie Falco