On Sun, Jun 28, 2026 at 7:49 AM Klemens Morgenstern via Boost < boost@lists.boost.org> wrote:
Disclaimer: I have been involved in the development in affiliation with the C++Alliance.
Klemens, Thank you for taking the time to write this up, and especially for framing it the way you did. Posting your concerns before issuing a review so the authors can address them first is exactly how the process should work. I appreciate it. Before I go through each point, I want to flag one thing. Your feedback covers Capy's API surface, and I want to address every item below. But Corosio - the networking layer, with headers spanning TCP, UDP, DNS, signal handling, file I/O, and SSL/TLS across four platform backends - hasn't been examined yet. You're one of the few people who could give the platform abstraction layer the scrutiny it deserves. You've shipped Process across POSIX and Windows. You understand what reactor trade-offs look like from the inside. I'd really welcome your analysis of that layer when you're ready. Now, point by point: # 2. capy::continuation
However, the `next` member is public and default constructed as null. If this was an implementation detail, why isn't it private?
You're right. The field is used internally by executors as an intrusive list node, and it's also used by authors of coroutine machinery like async_mutex and async_event who repurpose it in their own node-based data structures. But that doesn't mean it should sit there looking like a public API. We'll rename it to `reserved` to communicate "do not touch" to ordinary users while preserving access for machinery authors. Thank you for flagging this. https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... # 3. capy::async_mutex
The mutex is a thing that's not thread-safe. This naming is just bad. And no, it's not enough that it's documented, as this is very counter-intuitive.
I hear you on the initial reaction. But calling an async coordination primitive "mutex" is well-established across the async ecosystem. The term means mutual exclusion among concurrent async work, not necessarily OS-thread blocking. Here's what the rest of the industry does: cppcoro::async_mutex (Lewis Baker) - the canonical C++ coroutine precedent https://github.com/lewissbaker/cppcoro#async_mutex Python asyncio.Lock - stdlib docs literally say "implements a mutex lock" and "Not thread-safe" https://docs.python.org/3/library/asyncio-sync.html tokio::sync::Mutex - Rust's major async runtime uses Mutex for cooperative async locking https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html libunifex::async_mutex (Meta/Facebook) - the sender/receiver ecosystem https://github.com/facebookexperimental/libunifex kotlinx.coroutines.sync.Mutex - Kotlin's official coroutine library https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlin... WG21 P3955R0 - proposes async_mutex for std::execution https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p3955r0.pdf The C++ committee itself is standardizing this usage. I think Capy's naming is consistent with where the ecosystem is heading. That said, I understand the concern and the documentation makes the single-context scope explicit. If you feel strongly after seeing the precedent, I'm open to discussing alternatives. # 4. capy::DynamicBuffer / capy::read_until
The dynamic_buffer concept comes from asio and I don't think it is needed in a coroutine library. It's one of the "asio did it" features.
That's a fair point, and you're not alone in raising it. Peter Dimov made a similar observation about external usage evidence being thin. This is a legitimate scope question. I'm willing to discuss removing or reducing DynamicBuffer's role if the consensus during review is that it doesn't carry its weight in a coroutine-first library. Your inline loop example demonstrates the alternative clearly. As someone who has navigated these exact trade-offs in Cobalt, your perspective on what coroutine users actually reach for in practice carries real weight here. # 10. capy::io_result
`io_result` is not great, because std::get<> is not an extension point. That means I can't use `std::tie`. The code examples use structured binding for the `error_code`, but that means I will need to redeclare `ec` for every operations.
The ergonomic friction you're describing is real, and I wish the solution were as simple as adding std::get specializations. The reason io_result uses the tuple protocol the way it does is an MSVC code generation bug with aggregate decomposition in coroutines. When a coroutine does structured binding on a co_await result using a plain aggregate, MSVC produces corrupted values. The library becomes completely unusable. Not degraded - unusable. The workaround is to force structured bindings through the tuple protocol (get<>, tuple_size, tuple_element) instead of aggregate decomposition, which routes MSVC through a different codegen path that works correctly. This is documented in the commit history: https://github.com/cppalliance/capy/commit/04d0dc196bb039f4fb54c3911fc11862c... Structured bindings do work correctly: auto [ec, n] = co_await read_some(...) produces the right values. The specific gap is std::tie, because there's no std::get in namespace std - only ADL get in boost::capy. That gap is narrow. The alternative - reverting to plain aggregates - breaks the library on the most widely used C++ compiler on Windows. This isn't the only MSVC coroutine codegen issue we've had to work around. There's also a symmetric transfer use-after-free that crashes IOCP-based code under load, confirmed unfixed on MSVC 19.44: https://github.com/cppalliance/capy/issues/180 The io_result design strikes a balance between ergonomics and the library working at all on MSVC. We're happy to revisit this as newer MSVC versions ship. If Microsoft fixes the aggregate decomposition codegen, the constraint lifts and we can reconsider the design. The workaround is pragmatic, not permanent. The remaining points are ones where the code already addresses the concern raised. I've included links to the specific files and line numbers so you can verify each one directly. # 1. Comparison to asio / any_stream ownership
Their ownership semantic (construction by pointer is non-owning, construction by pointer is owning) is very unintuitive. Construction by lvalue ref is non-owning, by rvalue ref is owning would make more sense to me.
What Examining The Code Would Have Revealed: The design rationale isn't spelled out in the header comments, so it's not obvious from just reading the constructors. The ownership convention follows C++ Core Guidelines R.3: a raw pointer is non-owning. At the call site, any_read_stream(&sock) makes non-ownership visible through the & operator. any_read_stream(socket{ioc}) makes ownership visible through the temporary. The constructors are here: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The alternative you propose - lvalue-ref as non-owning, rvalue-ref as owning - has a subtle problem you'd catch immediately given your template expertise. In a template context, S&& is a forwarding reference, not an rvalue reference, which creates deduction ambiguity. And any_stream(sock) as an lvalue would be visually indistinguishable from a copy or move construction. The current convention gives the caller clear visual ownership signals at every call site. # 5. capy's non dynamic buffers
However, it seems capy is copying `asios` buffer sequence without reconsidering the following new circumstances ... If I have a function accepting a `span<const_buffer>` and I have a generic buffer sequence I could do this
What Examining The Code Would Have Revealed: Unless you happen to look at the vtable signatures in the type-erased layer, it's easy to miss that Capy already does exactly what you're proposing. The type-erased stream types accept spans internally. Here's the any_read_stream vtable - notice the third parameter to construct_awaitable is std::span<mutable_buffer const>: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The templated surface exists for compile-time optimization, and the span conversion happens at the type-erasure boundary. The architecture you describe - concrete types at the boundary, templates above - is the architecture Capy uses. # 6. capy::any_executor
I don't understand why there's an `executor_ref` and an `any_executor`.
What Examining The Code Would Have Revealed: The codebase has grown since we last discussed this on Slack, and the performance rationale isn't obvious from the headers alone. executor_ref is two pointers, trivially copyable: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... any_executor uses shared_ptr with virtual dispatch: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... executor_ref is copied at 32+ production sites on per-operation hot paths: every co_await this_coro::executor, every strand dispatch and post, every when_all/when_any child launch, every delay suspension, every async_mutex lock, every run boundary crossing. Each of those copies is a two-register memcpy. Merging them into a single type backed by shared_ptr replaces every one of those memcpys with an atomic reference count increment. On architectures where atomics are expensive - ARM, NUMA - that's measurable across 32+ sites per operation. You know from Cobalt what per-operation overhead costs look like at scale. The split is the same trade-off you'd make. # 7. capy::execution_context
Next, the `capy::execution_context` is odd. It is only used by `corosio`, however it is part of any `executor`.
What Examining The Code Would Have Revealed: This is spread across several files that aren't obvious from the top-level headers, so it's easy to miss. Capy's own thread_pool inherits directly from execution_context: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... Nine files in Capy reference it. It's not only used by Corosio. That said, you're right that the constructor should be protected so execution_context can't be default-constructed on its own. That's a good API observation and we'll make that change. Thank you. # 8. capy::IoAwaitable
Not every awaitable is asynchronous. That means that not every awaitable will need to dispatch back through the executor. I think restricting `co_await` to just `io_awaitables` is too restrictive.
What Examining The Code Would Have Revealed: The naming might suggest more restriction than actually exists, which is understandable. The IoAwaitable concept constrains the await_suspend signature, not the behavior. Here's the entire concept definition - it's nine lines: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... A synchronous awaitable that returns true from await_ready never reaches await_suspend at all. The extra io_env const* parameter is dead code in that path. Cost: zero. And in task.hpp, the if constexpr branch shows how it works: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The concept requires only that the signature accepts the environment pointer. It says nothing about whether the awaitable must use it, must be asynchronous, or must dispatch through the executor. # 9. capy::async_run
The `async_run` double invocation just looks weird. I get the intention of setting the thread_local memory resource for when the task is created. I do however think that this is a consequence of `asio did it` design. This could be much more intuitive, if the `corosio::io_context` API changed to this:
double res = ctx.run(main_task());
What Examining The Code Would Have Revealed: This is one of those things that only becomes visible when you trace what run_async actually does versus what ctx.run does - the names make them look interchangeable when they're fundamentally different operations. run_async is a non-blocking launch mechanism. It dispatches the task through the executor and returns: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... Your proposed ctx.run(main_task()) is a blocking call that fuses task launch with event loop pumping. These are different operations. The non-blocking design enables launching multiple independent task trees before pumping: run_async(ex)(accept_connections()); run_async(ex)(health_check_loop()); run_async(ex)(metrics_reporter()); ctx.run(); The blocking version already exists - it's run_blocking in the test utilities: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The "double invocation" you mention - that's the TLS frame allocation mechanism. The run_async_wrapper constructor sets thread-local state before the task argument is evaluated, exploiting C++17 postfix evaluation order so that the task's coroutine frame is allocated with the correct memory resource. Asio has no equivalent mechanism. This is novel to Capy. To summarize: I'm grateful for the depth of your engagement here. On continuation::next and execution_context's constructor, you're right and we'll make those changes. On async_mutex naming, the ecosystem precedent is strong but I'm open to discussion. On DynamicBuffer scope, that's a legitimate conversation and your perspective as someone who's built Cobalt matters. On the remaining points, I'd invite you to check the code at the links above - the library addresses several of these concerns in ways that aren't immediately visible from the API surface. I'd love to see your analysis of Corosio when you have time. That's where the platform engineering decisions live. Given our collaboration on P4126R0, I know you understand the depth of what's involved in coroutine library design at this level, and there aren't many people in the C++ ecosystem with your combination of coroutine expertise and cross-platform I/O experience. Thanks again for doing this the right way. Vinnie