Disclaimer: I have been involved in the development in affiliation with the C++Alliance. I wish to first of all express that I really admire the effort of creating a coroutine-first alternative to boost.asio that can optimize run- and compile-time by having only one asynchronous completion mechanism. However, I fear that capy & corosio are not ready. Below is a list of criticisms & issues that might end up as part of my review. However, I think it is better to allow the authors to address these first. I have discussed some of these with Vinnie on Slack already, but I think it is better to get this on the record for the review. So I hope he doesn't mind reiterating some of the discussion we had. # 1. Comparison to asio Asio is incredibly feature-rich and I would not expect capy/corosio to match all the features. However, I would hope it to would more features or different approaches, that take the coroutines into account. However, they are mainly compile and run-time improvements and very little in functionality. Since `capy::task` is functional equivalent to `asio::awaitable` the only thing I can identify are the type erased streams (e.g. any_read_stream) and the ability to use different SSL implementations like wolfSSL. The type-erase stream types however are then not even used, the corosio::read_until function is templated. Their ownership semantic (construction by pointer is non-owning, construction by pointer is owning) is very unintuitive. Construction by lvalue ref is non-owning, by rvalue ref is owning would make more sense to me. As a side-note: Botan actually offers a SSL stream for asio, so technically other SSL implementations for asio exist. Other than that there seems to be a lot of "asio did it" design decisions. ## 2. capy::continuation The continuation type is useful because the executor doesn't need to allocate a list on it's own, it uses an intrusive list of which `continuation` is a node. However, the `next` member is public and default constructed as null. If this was an implementation detail, why isn't it private? Or at least name `next_` or `_next` ? It could be public and allow me to submit multiple continuations at once, but as it is, it is an odd api. At the very least I would expect a non-explicity constructor from a `coroutine_handle`. It is an implementation detail leaking out for no apparent reason. ## 3. capy::async_mutex The mutex is a thing that's not thread-safe. This naming is just bad. And no, it's not enough that it's documented, as this is very counter-intuitive. ## 4. capy::DynamicBuffer / capy::read_until The dynamic_buffer concept comes from asio and I don't think it is needed in a coroutine library. It's one of the "asio did it" features. The reason asio has those is because of `async_read_until`. In asio, writing an asynchronous algorithm is extremely painful, so using `async_read_until` is a massive help. However, once we have coroutine, this becomes a `for` loop. std::string buffer; auto [ec, n] = co_await read_until(s, capy::dynamic_buffer(buffer), "\n", 2048); Can be written as : std::string buffer; while (buffer.find('\n') == std::string::npos) { const auto prefix = buffer.size(); buffer.resize(prefix + 2048); auto buf = capy::make_buffer(buffer); buf += prefix; auto [ec, n] = co_await s.read_some(buf); buffer.resize(prefix + n); if (ec) co_return ec; } From my experience using asio for years, I can also say that I almost never used read_until. It usually ended up being replaced because the matching function wasn't ideal for many protocols. Mainly because the matching condition (as it does in capy' read_until) always rescans the whole buffer. Even in the example above it would be more efficient to pass `prefix` after find to not scan through the whole string again. I would also like to add that beast is providing it's own buffer types to use with it's http functions. Dynamic Buffers may be a good idea for certain protocols, but it is not at all evident to me that they make a good generic concept. Rather I think that a dynamic buffer should be optimized for it's protocol. But, because we have `read_until` we now replace 10 lines of code with multiple types and concepts that I consider unnecessary. It seems yet another design decision based on "asio did it" that doesn't seem to have reevaluated the changed conditions. ## 5. capy's non dynamic buffers First of all, I agree with asio's decision to use a pair of `void*` and `size_t`, because memory is untyped. So those two are correct. However, it seems capy is copying `asios` buffer sequence without reconsidering the following new circumstances: If I have a function accepting a `span<const_buffer>` and I have a generic buffer sequence I could do this: some_buffer_sequence bs; co_await write(s, as_span(bs)); Where span returns a type that can be converted to the span. Because the `co_await` expression is - as one might guess - an expression, the lifetime of the container referenced by the span will be destroyed after the coroutine is returned. Thereby we could avoid templating the write function over the buffer sequence and reduce compile time and complexity. An IO object could also communicate through it's API if actually supports vectorized IO or not. The buffer types could also just wrap around the OS buffer types (e.g. iovec) and thus avoid any copying. If a stream does not get templated on the buffer_sequence, but just takes a span, implementation of other stream types gets much simpler, too. This is important if capy wants to establish it's stream concepts as the de-factor or actual standard. To be clear: I am not saying that a `span` is the right solution, but there are simpler solutions that can have a concrete type. Asio's buffer sequences need to work with asynchronous completion that do not get `co_await`ed and thus has to solve lifetime problems, that capy hasn't. The design of capy does not seem to consider this subtle difference. ## 6. capy::any_executor I don't understand why there's an `executor_ref` and an `any_executor`. If the `any_stream` support non-owning mode, why couldn't the `any_executor` follow the same model and we don't have an `executor_ref` ? I know that `executor_ref` is uses in the tasks and must be cheaply copyable, but why couldn't that use a single any_executor at the structure allocated by `async_run` and then a `any_executor&` in the task. At the very least, there's an inconsistency between `any_executor` and the `any_stream` types. ## 7. capy::execution_context Next, the `capy::execution_context` is odd. It is only used by `corosio`, however it is part of any `executor`. However, it turns out that the `execution_context` can be default constructed but is completely useless unless constructed as the parent class of corosio::io_context? Why is that? In asio, I can default construct an `execution_context` and it'll spawn off a thread for the reactor and it'll work great. Likewise the `asio::thread_pool` will work, where as capy::thread_pool will not allow me to create a corosio::tcp_socket. If the `execution_context` must not be used this way, the constructor should be `protected`. I would very much prefer it behaved like `asio`, because that is required to make the asio bridge I wrote work. ## 8. capy::IoAwaitable The IO awaitable protocol is an interesting idea of how to solve the executor propagation problem. However, it has one issue: not every waitable is asynchronous. That means that not every awaitable will need to dispatch back through the executor. I think restricting `co_await` to just `io_awaitables` is too restrictive and puts `capy::task` as `asio::awaitable`. I don't think that proposing io_awaitable for the standard is an argument either, since at this point regular `awaitable`s are the standard protocol. I don't know how capy should fix it, but this is a missing feature, if `capy` wants to a generic coroutine library. `asio::awaitable` achieves a similar thing, by only allowing `asio::awaitables` to await each other. That is fine, but it turns a `capy::task` into something more specific, a coroutine type aimed towards use with corosio. ## 9. capy::async_run The `async_run` double invocation just looks weird. I get the intention of setting the thread_local memory resource for when the task is created. I do however think that this is a consequence of `asio did it` design. This could be much more intuitive, if the `corosio::io_context` API changed to this: // we use a main task that can use wait_all & wait_any to manage other work capy::task<double> main_task(); { corosio::io_context ctx; double res = ctx.run(main_task()); } We would achieve two things: we got rid of the callback, which is generally a good idea in a coroutine library, and we could bind the memory_resource to the io_context. This would give us a simpler and much more intuitive API and an implicit work guard with the `main_task`. But alas, capy/corosio just copies `asio`'s API for `io_context` and requires a call to `run`; ## 10. capy::io_result `io_result` is not great, because std::get<> is not an extension point. That means I can't use `std::tie`. The code examples use structured binding for the `error_code`, but that means I will need to redeclare `ec` for every operations. const auto [ec, nr] = co_await read_some(...); const auto [ec2, nw] = co_await write_some(...); This needs to be solved.
On Sun, Jun 28, 2026 at 7:49 AM Klemens Morgenstern via Boost < boost@lists.boost.org> wrote:
Disclaimer: I have been involved in the development in affiliation with the C++Alliance.
Klemens, Thank you for taking the time to write this up, and especially for framing it the way you did. Posting your concerns before issuing a review so the authors can address them first is exactly how the process should work. I appreciate it. Before I go through each point, I want to flag one thing. Your feedback covers Capy's API surface, and I want to address every item below. But Corosio - the networking layer, with headers spanning TCP, UDP, DNS, signal handling, file I/O, and SSL/TLS across four platform backends - hasn't been examined yet. You're one of the few people who could give the platform abstraction layer the scrutiny it deserves. You've shipped Process across POSIX and Windows. You understand what reactor trade-offs look like from the inside. I'd really welcome your analysis of that layer when you're ready. Now, point by point: # 2. capy::continuation
However, the `next` member is public and default constructed as null. If this was an implementation detail, why isn't it private?
You're right. The field is used internally by executors as an intrusive list node, and it's also used by authors of coroutine machinery like async_mutex and async_event who repurpose it in their own node-based data structures. But that doesn't mean it should sit there looking like a public API. We'll rename it to `reserved` to communicate "do not touch" to ordinary users while preserving access for machinery authors. Thank you for flagging this. https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... # 3. capy::async_mutex
The mutex is a thing that's not thread-safe. This naming is just bad. And no, it's not enough that it's documented, as this is very counter-intuitive.
I hear you on the initial reaction. But calling an async coordination primitive "mutex" is well-established across the async ecosystem. The term means mutual exclusion among concurrent async work, not necessarily OS-thread blocking. Here's what the rest of the industry does: cppcoro::async_mutex (Lewis Baker) - the canonical C++ coroutine precedent https://github.com/lewissbaker/cppcoro#async_mutex Python asyncio.Lock - stdlib docs literally say "implements a mutex lock" and "Not thread-safe" https://docs.python.org/3/library/asyncio-sync.html tokio::sync::Mutex - Rust's major async runtime uses Mutex for cooperative async locking https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html libunifex::async_mutex (Meta/Facebook) - the sender/receiver ecosystem https://github.com/facebookexperimental/libunifex kotlinx.coroutines.sync.Mutex - Kotlin's official coroutine library https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlin... WG21 P3955R0 - proposes async_mutex for std::execution https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p3955r0.pdf The C++ committee itself is standardizing this usage. I think Capy's naming is consistent with where the ecosystem is heading. That said, I understand the concern and the documentation makes the single-context scope explicit. If you feel strongly after seeing the precedent, I'm open to discussing alternatives. # 4. capy::DynamicBuffer / capy::read_until
The dynamic_buffer concept comes from asio and I don't think it is needed in a coroutine library. It's one of the "asio did it" features.
That's a fair point, and you're not alone in raising it. Peter Dimov made a similar observation about external usage evidence being thin. This is a legitimate scope question. I'm willing to discuss removing or reducing DynamicBuffer's role if the consensus during review is that it doesn't carry its weight in a coroutine-first library. Your inline loop example demonstrates the alternative clearly. As someone who has navigated these exact trade-offs in Cobalt, your perspective on what coroutine users actually reach for in practice carries real weight here. # 10. capy::io_result
`io_result` is not great, because std::get<> is not an extension point. That means I can't use `std::tie`. The code examples use structured binding for the `error_code`, but that means I will need to redeclare `ec` for every operations.
The ergonomic friction you're describing is real, and I wish the solution were as simple as adding std::get specializations. The reason io_result uses the tuple protocol the way it does is an MSVC code generation bug with aggregate decomposition in coroutines. When a coroutine does structured binding on a co_await result using a plain aggregate, MSVC produces corrupted values. The library becomes completely unusable. Not degraded - unusable. The workaround is to force structured bindings through the tuple protocol (get<>, tuple_size, tuple_element) instead of aggregate decomposition, which routes MSVC through a different codegen path that works correctly. This is documented in the commit history: https://github.com/cppalliance/capy/commit/04d0dc196bb039f4fb54c3911fc11862c... Structured bindings do work correctly: auto [ec, n] = co_await read_some(...) produces the right values. The specific gap is std::tie, because there's no std::get in namespace std - only ADL get in boost::capy. That gap is narrow. The alternative - reverting to plain aggregates - breaks the library on the most widely used C++ compiler on Windows. This isn't the only MSVC coroutine codegen issue we've had to work around. There's also a symmetric transfer use-after-free that crashes IOCP-based code under load, confirmed unfixed on MSVC 19.44: https://github.com/cppalliance/capy/issues/180 The io_result design strikes a balance between ergonomics and the library working at all on MSVC. We're happy to revisit this as newer MSVC versions ship. If Microsoft fixes the aggregate decomposition codegen, the constraint lifts and we can reconsider the design. The workaround is pragmatic, not permanent. The remaining points are ones where the code already addresses the concern raised. I've included links to the specific files and line numbers so you can verify each one directly. # 1. Comparison to asio / any_stream ownership
Their ownership semantic (construction by pointer is non-owning, construction by pointer is owning) is very unintuitive. Construction by lvalue ref is non-owning, by rvalue ref is owning would make more sense to me.
What Examining The Code Would Have Revealed: The design rationale isn't spelled out in the header comments, so it's not obvious from just reading the constructors. The ownership convention follows C++ Core Guidelines R.3: a raw pointer is non-owning. At the call site, any_read_stream(&sock) makes non-ownership visible through the & operator. any_read_stream(socket{ioc}) makes ownership visible through the temporary. The constructors are here: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The alternative you propose - lvalue-ref as non-owning, rvalue-ref as owning - has a subtle problem you'd catch immediately given your template expertise. In a template context, S&& is a forwarding reference, not an rvalue reference, which creates deduction ambiguity. And any_stream(sock) as an lvalue would be visually indistinguishable from a copy or move construction. The current convention gives the caller clear visual ownership signals at every call site. # 5. capy's non dynamic buffers
However, it seems capy is copying `asios` buffer sequence without reconsidering the following new circumstances ... If I have a function accepting a `span<const_buffer>` and I have a generic buffer sequence I could do this
What Examining The Code Would Have Revealed: Unless you happen to look at the vtable signatures in the type-erased layer, it's easy to miss that Capy already does exactly what you're proposing. The type-erased stream types accept spans internally. Here's the any_read_stream vtable - notice the third parameter to construct_awaitable is std::span<mutable_buffer const>: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The templated surface exists for compile-time optimization, and the span conversion happens at the type-erasure boundary. The architecture you describe - concrete types at the boundary, templates above - is the architecture Capy uses. # 6. capy::any_executor
I don't understand why there's an `executor_ref` and an `any_executor`.
What Examining The Code Would Have Revealed: The codebase has grown since we last discussed this on Slack, and the performance rationale isn't obvious from the headers alone. executor_ref is two pointers, trivially copyable: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... any_executor uses shared_ptr with virtual dispatch: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... executor_ref is copied at 32+ production sites on per-operation hot paths: every co_await this_coro::executor, every strand dispatch and post, every when_all/when_any child launch, every delay suspension, every async_mutex lock, every run boundary crossing. Each of those copies is a two-register memcpy. Merging them into a single type backed by shared_ptr replaces every one of those memcpys with an atomic reference count increment. On architectures where atomics are expensive - ARM, NUMA - that's measurable across 32+ sites per operation. You know from Cobalt what per-operation overhead costs look like at scale. The split is the same trade-off you'd make. # 7. capy::execution_context
Next, the `capy::execution_context` is odd. It is only used by `corosio`, however it is part of any `executor`.
What Examining The Code Would Have Revealed: This is spread across several files that aren't obvious from the top-level headers, so it's easy to miss. Capy's own thread_pool inherits directly from execution_context: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... Nine files in Capy reference it. It's not only used by Corosio. That said, you're right that the constructor should be protected so execution_context can't be default-constructed on its own. That's a good API observation and we'll make that change. Thank you. # 8. capy::IoAwaitable
Not every awaitable is asynchronous. That means that not every awaitable will need to dispatch back through the executor. I think restricting `co_await` to just `io_awaitables` is too restrictive.
What Examining The Code Would Have Revealed: The naming might suggest more restriction than actually exists, which is understandable. The IoAwaitable concept constrains the await_suspend signature, not the behavior. Here's the entire concept definition - it's nine lines: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... A synchronous awaitable that returns true from await_ready never reaches await_suspend at all. The extra io_env const* parameter is dead code in that path. Cost: zero. And in task.hpp, the if constexpr branch shows how it works: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The concept requires only that the signature accepts the environment pointer. It says nothing about whether the awaitable must use it, must be asynchronous, or must dispatch through the executor. # 9. capy::async_run
The `async_run` double invocation just looks weird. I get the intention of setting the thread_local memory resource for when the task is created. I do however think that this is a consequence of `asio did it` design. This could be much more intuitive, if the `corosio::io_context` API changed to this:
double res = ctx.run(main_task());
What Examining The Code Would Have Revealed: This is one of those things that only becomes visible when you trace what run_async actually does versus what ctx.run does - the names make them look interchangeable when they're fundamentally different operations. run_async is a non-blocking launch mechanism. It dispatches the task through the executor and returns: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... Your proposed ctx.run(main_task()) is a blocking call that fuses task launch with event loop pumping. These are different operations. The non-blocking design enables launching multiple independent task trees before pumping: run_async(ex)(accept_connections()); run_async(ex)(health_check_loop()); run_async(ex)(metrics_reporter()); ctx.run(); The blocking version already exists - it's run_blocking in the test utilities: https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff... The "double invocation" you mention - that's the TLS frame allocation mechanism. The run_async_wrapper constructor sets thread-local state before the task argument is evaluated, exploiting C++17 postfix evaluation order so that the task's coroutine frame is allocated with the correct memory resource. Asio has no equivalent mechanism. This is novel to Capy. To summarize: I'm grateful for the depth of your engagement here. On continuation::next and execution_context's constructor, you're right and we'll make those changes. On async_mutex naming, the ecosystem precedent is strong but I'm open to discussion. On DynamicBuffer scope, that's a legitimate conversation and your perspective as someone who's built Cobalt matters. On the remaining points, I'd invite you to check the code at the links above - the library addresses several of these concerns in ways that aren't immediately visible from the API surface. I'd love to see your analysis of Corosio when you have time. That's where the platform engineering decisions live. Given our collaboration on P4126R0, I know you understand the depth of what's involved in coroutine library design at this level, and there aren't many people in the C++ ecosystem with your combination of coroutine expertise and cross-platform I/O experience. Thanks again for doing this the right way. Vinnie
Klemens Morgenstern wrote:
The dynamic_buffer concept comes from asio and I don't think it is needed in a coroutine library. It's one of the "asio did it" features. ... It seems yet another design decision based on "asio did it" that doesn't seem to have reevaluated the changed conditions. ...
I find myself agreeing with Klemens's general point. There are things in Capy that are copy-pasted from Beast, copy-pasted from cppalliance/buffers, copy-pasted from "Asio has them so we need them too." We have the opportunity to start from a clean slate and only add things on an as-needed basis, with rationale why we added them. And the above three reasons are not that. Capy needs to be foundational but minimal, with each feature there having strong justification for existing in Capy (rather than somewhere else.) Some of the components already meet this condition, others not. What makes this important now, rather than merely a theoretical nitpick, is that once things are released to the world (as part of a Boost release), it becomes hard to take them away. Whereas if we don't ship something now, and then it turns out that we should have done so, we can add it in a later release very easily. So our initial Capy needs to be as small as possible.
participants (3)
-
Klemens Morgenstern -
Peter Dimov -
Vinnie Falco