On Mon, Jun 29, 2026 at 3:25 AM Rainer Deyke via Boost < boost@lists.boost.org> wrote:
This is my second formal review of Capy, meant to replace the original, which I am hereby formally withdrawing.
Thank you for refocusing on the core concepts, Rainer. You're right that the core of Capy is the protocol for coroutine environment propagation, and that's the right lens for evaluation. I'll respond to each section.
First off, the name is wrong. The concept has nothing to do with i/o. I understand the history of the name, that the concept was written first and foremost to support the corosio library as a replacement for ASIO, both of which have io in the name, but it's not a descriptive name. Renaming it is going to be disruptive, which is all the more reason to rename it *now* instead of waiting for the rename to be forced when library is standardized.
Yes and there's a little more to it. The name is political. "I/O" is an unclaimed domain in the standards committee and one where we have evidence that it is the most natural fit for coroutines. I do recognize that the name is imperfect. However, and I will refer back to P4172, there are the three audiences: 1. Application authors 2. Framework authors 3. I/O library authors Only groups 2 and 3 are exposed to the concept name, and together this cohort is vastly smaller. This cohort is also more skilled and specialized. The name is in theory less consequential. Regardless, we are considering "Cowaitable" yet of course that has its own problems. A rename would be ideally timed before any initial Boost release in which the library appears (assuming it is accepted).
IoAwaitable exists to support the guarantee that every task runs on its executor. This guarantee is useful for making it easier to reason about code and for writing correct code. It also comes at a considerable runtime cost. Coroutines get bounced around between IoAwaitables and executors like a game of ping-pong.
The "bouncing" framing doesn't match how the library is used in practice. In the primary use case, (networking) you launch a coroutine chain on an executor (a strand, an io_context) and the entire chain stays there. No executor hopping. The executor affinity is the point: you set it once and stop thinking about it. The cost you describe only appears when you explicitly choose to switch executors via `run`. That's a deliberate action, not something that happens behind your back. The question I'd ask is: why would you be switching executors frequently enough for the overhead to matter? In I/O code, executor switches are rare. You launch on a strand or an io_context, and you stay there.
It's not *wrong*, per se, to do it this way. It's a valid approach. The benefits are worth the costs in many cases. But it's not the only valid approach. And it seems like a shame that a supposedly universal protocol like IoAwaitable forces these compromises on users without escape hatches. So I would like to propose two escape hatches that don't allow IoAwaitables to be bypassed, but work with them.
I appreciate the thoughtfulness to approaches. I'll address each escape hatch.
The first is resume_on. I realize that this has already rejected by the Capy developers in favor of capy::run, but each capy::run call requires an extra coroutine frame and an extra executor switch after the inner coroutine co_returns. resume_on also allows the code running on the alternate executor to directly use co_return for the main coroutine, which provides better ergonomics.
The extra coroutine frame exists because `run` creates a new `io_env` for the child: new executor, inherited stop token and allocator. The trampoline ensures you return to your original executor when the child completes. This is the price of correctness. There are three problems with `resume_on`: 1. the library provides the tools for anyone to implement `resume_on` themselves. The executor concept, `continuation`, `dispatch`, and `post` are all public. Nothing prevents a user from building `resume_on` as a standalone awaitable. We don't have to ship it to enable it. 2. `resume_on` breaks the environment model. When you `resume_on(ex)` and then `co_await` an IoAwaitable, the `io_env` still points to the *original* executor. The IoAwaitable dispatches its completion to the wrong executor. To fix this, you either need to update the `io_env` (but it's `const*` and shared - who owns the new one?) or allocate a new `io_env` (which is exactly what `run` already does). So `resume_on` either breaks the environment or duplicates the machinery of `run`, minus the safety of scoped lifetime. 3. coroutines are both easy to use and hard to use. They're more ergonomic than callbacks, but the C++ committee standardized the machinery in 2020 and then spent six years building P2300 senders instead of providing library components. The consequence is that the ecosystem has minimal collective experience with coroutines. Providing `resume_on` as a first-class API in a Boost library would invite misuse from users who are still learning the fundamentals. It's an expert feature disguised as a convenience, with significant sharp edges. There's already enough sharp edges in Capy.
One caveat about resume_on: its effect should be limited to the coroutine in which it is used. When coroutine A co_awaits coroutine B, and coroutine uses resume_on and then co_returns, execution on coroutine A should always resume on A's original executor, not the executor that B switched to.
This caveat is exactly what makes `resume_on` as complex as `run`. When coroutine B calls `resume_on(ex2)` and then `co_return`s, coroutine A must resume on A's original executor. That means `final_suspend` must know which executor the parent was on and dispatch back to it. That is exactly what the trampoline in `run` does. The "simpler" `resume_on` requires the same machinery as `run` to be correct. The complexity is merely hidden not removed. Implementing this scoping correctly is enormously complex. I'm not even certain it's feasible without the trampoline mechanism that `run` already provides. `run` IS `resume_on`, implemented correctly, with scoped lifetime and automatic return-to-caller. The extra coroutine frame is the price of correctness. The existing design rationale is documented in the "Capy and TooManyCooks" comparison (Section 8 of the documentation). We'll expand that document to address `resume_on` specifically and explain why `run` is the correct implementation of the same idea.
The second escape hatch is immediate_executor. It looks something like this:
class immediate_executor { public: std::coroutine_handle<> ce.dispatch(capy::continuation &c) { // Obey the letter of the law by not just returning c.h... this->post(c); return {}; }
void post(capy::continuation &c) { // ...but violate the spirit of the law by calling h.resume(). c.h.resume(); } // ...other functions here... };
How do you propose not overflowing the stack? Your `post` calls `c.h.resume()` directly. That resumes a coroutine from inside another coroutine's execution context. Each `resume()` adds a stack frame. This is the exact problem symmetric transfer was invented to solve. `await_suspend` returns a `coroutine_handle<>` so the runtime can tail-call it without growing the stack. The `dispatch` path can inline via symmetric transfer (returning `c.h` from `await_suspend`), but `post` cannot - it's called from contexts where there is no `await_suspend` return value to tail-call through. An immediate_executor that calls `resume()` directly from `post` defeats the entire mechanism. Beyond stack overflow, there are further constraints. `async_mutex` stop callbacks call `executor.post(cont_)` from arbitrary threads. An immediate_executor that inlines `post` would resume the coroutine on the wrong thread, corrupting the thread-local frame allocator. This is, ironically, exactly the class of bug you asked us to document in condition 4.
You might say that it's not Capy's business to educate the users on basic coroutine safety. I say that Capy made it its business when it included a basic coroutine primer in its documentation.
We wouldn't say that, and I'm glad you raised it. Educating users is not just Capy's business, it's our obligation. This follows the tradition I started with Beast, which included extensive conceptual documentation precisely because the domain was new to most C++ developers. Coroutines are in the same position today - arguably worse. The committee standardized coroutine machinery in C++20 and then provided no library components for six years. This wasn't an accident. P2300 section 1.9.2 dismissed coroutines as a basis for asynchrony with five paragraphs of text and no supporting measurements. "Symmetric transfer" - the mechanism that makes coroutines safe and efficient - does not appear once in eleven revisions of that paper. The consequence: no standard coroutine library components, no educational infrastructure, no ecosystem experience. That gap is why reviewers ask about resume_on and immediate_executor - not because the ideas are wrong, but because the fundamentals haven't been taught. Users don't just lack experience with Capy; they lack experience with coroutines in general. Where else will they go? We are the experts. Who else but us should write the tutorial? Who else could? Capy is filling a hole the committee dug. Many of the concerns raised in this review: executor switching, immediate resumption, runtime cost - trace back to that knowledge gap. We need to explain the *why*, not just the *how*. The primer should cover pitfalls beyond the obvious: - Dangling references in lambda captures - Constructing a task without immediately co_awaiting it - the task captures references to the caller's stack frame at construction time. Store it and co_await later, those references dangle. You don't need a lambda for this to bite you. - Why `std::mutex` inside a coroutine running on a thread pool is a deadlock waiting to happen - Why you must never call `h.resume()` from a stop_callback - post through the executor instead - Why symmetric transfer exists - calling `resume()` directly grows the stack, which is why executors queue work instead of resuming inline Your original review is excellent source material for gotchas encountered by a knowledgeable reviewer. We'll reference it.
When ASIO introduced strands, it was a revolutionary alternative to std::mutex. But the alternative to capy::strand is not std::mutex but capy::async_mutex, and they are more alike than they are different. [...] Can they be merged? Should I prefer one over the other? Can some text be added to the documentation to help me choose between them?
They serve different granularity and should not be merged. We'll add a comparison section to the documentation. The key distinctions: A strand is coarse-grained. All work dispatched through the strand serializes. Use it when an entire coroutine tree needs exclusive access - the typical case is per-connection state in a server. You launch the connection handler on a strand and everything downstream serializes automatically. An async_mutex is fine-grained. Coroutines run concurrently and only serialize at lock/unlock points. Use it when most of your coroutine's work can run in parallel and only specific shared variables need protection. A strand has no failure mode for acquisition - you're either on it or not. An async_mutex returns `io_result<>` because lock acquisition can be canceled via stop token. A strand supports direct posting (`strand.post(c)`) without `co_await`. An async_mutex always requires `co_await` because it may need to suspend. They are often used together. The async_mutex example in the documentation runs on a strand to satisfy the mutex's single-executor threading requirement.
Apparently locking can fail for mutexes but not for strands? Or do strands just report failure differently?
Locking an async_mutex can fail because lock acquisition is cancellable. If the coroutine's stop token fires while it's waiting in the queue, the lock attempt completes with `error::canceled`. This is a feature, not an asymmetry - a strand doesn't "lock" in the same sense. You dispatch work to it, and it runs when the strand is available. There's no queue-entry cancellation because there's no explicit acquisition step.
1. The documentation of IoAwaitable is still broken because it does not use capy::continuation. I am told that this will be fixed. It hasn't been fixed yet. It needs to fixed before Capy is accepted into Boost.
Agreed. The documentation example uses `std::coroutine_handle<>` where it should use `continuation`. The header comment in `io_awaitable.hpp` has the correct example. We will synchronize the doc with the header.
Note that every single one of these can potentially be fixed by just changing the documentation. I haven't tested the implementation, but I assume it's fine.
To summarize: 1. IoAwaitable documentation: will fix. 2. resume_on - `run` is resume_on, implemented correctly. The extra frame is the price of scoped lifetime and automatic return-to- caller. We'll expand the design rationale documentation. 3. immediate_executor - stack overflow, wrong-thread resumption, and frame allocator corruption prevent a safe general implementation. We'll document the constraints for expert users. 4. Coroutine safety documentation: yes! Educating users is our responsibility and we'll expand the primer substantially. 5. Strand vs async_mutex - accepted. We'll add comparison documentation with usage guidelines. Vinnie