[boost] Re: [capy review] Klemens' Laundry list

28 Jun 2026

      On Sun, Jun 28, 2026 at 7:49 AM Klemens Morgenstern via Boost <
boost@lists.boost.org> wrote:
...
Disclaimer: I have been involved in the development in affiliation with the
 C++Alliance.
Klemens,

Thank you for taking the time to write this up, and especially for framing
it the way you did. Posting your concerns before issuing a review so the
authors can address them first is exactly how the process should work. I
appreciate it.

Before I go through each point, I want to flag one thing. Your feedback
covers Capy's API surface, and I want to address every item below. But
Corosio - the networking layer, with headers spanning TCP, UDP, DNS, signal
handling, file I/O, and SSL/TLS across four platform backends - hasn't been
examined yet. You're one of the few people who could give the platform
abstraction layer the scrutiny it deserves. You've shipped Process across
POSIX and Windows. You understand what reactor trade-offs look like from
the inside. I'd really welcome your analysis of that layer when you're
ready.

Now, point by point:

 # 2. capy::continuation
...
However, the `next` member is public and default constructed as null.
If this was an implementation detail, why isn't it private?
You're right. The field is used internally by executors as an intrusive
list node, and it's also used by authors of coroutine machinery like
async_mutex and async_event who repurpose it in their own node-based data
structures. But that doesn't mean it should sit there looking like a public
API. We'll rename it to `reserved` to communicate "do not touch" to
ordinary users while preserving access for machinery authors. Thank you for
flagging this.

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

# 3. capy::async_mutex
...
The mutex is a thing that's not thread-safe. This naming is just bad.
And no, it's not enough that it's documented, as this is very
counter-intuitive.
I hear you on the initial reaction. But calling an async coordination
primitive "mutex" is well-established across the async ecosystem. The term
means mutual exclusion among concurrent async work, not necessarily
OS-thread blocking. Here's what the rest of the industry does:

cppcoro::async_mutex (Lewis Baker) - the canonical C++ coroutine precedent
https://github.com/lewissbaker/cppcoro#async_mutex

Python asyncio.Lock - stdlib docs literally say "implements a mutex lock"
and "Not thread-safe"
https://docs.python.org/3/library/asyncio-sync.html

tokio::sync::Mutex - Rust's major async runtime uses Mutex for cooperative
async locking
https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html

libunifex::async_mutex (Meta/Facebook) - the sender/receiver ecosystem
https://github.com/facebookexperimental/libunifex

kotlinx.coroutines.sync.Mutex - Kotlin's official coroutine library
https://kotlinlang.org/api/kotlinx.coroutines/kotlinx-coroutines-core/kotlin...

WG21 P3955R0 - proposes async_mutex for std::execution
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2026/p3955r0.pdf

The C++ committee itself is standardizing this usage. I think Capy's naming
is consistent with where the ecosystem is heading. That said, I understand
the concern and the documentation makes the single-context scope explicit.
If you feel strongly after seeing the precedent, I'm open to discussing
alternatives.

# 4. capy::DynamicBuffer / capy::read_until
...
The dynamic_buffer concept comes from asio and I don't think it is
needed in a coroutine library. It's one of the "asio did it" features.
That's a fair point, and you're not alone in raising it. Peter Dimov made a
similar observation about external usage evidence being thin. This is a
legitimate scope question. I'm willing to discuss removing or reducing
DynamicBuffer's role if the consensus during review is that it doesn't
carry its weight in a coroutine-first library. Your inline loop example
demonstrates the alternative clearly. As someone who has navigated these
exact trade-offs in Cobalt, your perspective on what coroutine users
actually reach for in practice carries real weight here.

# 10. capy::io_result
...
`io_result` is not great, because std::get<> is not an extension
point. That means I can't use `std::tie`. The code examples use
structured binding for the `error_code`, but that means I will need
to redeclare `ec` for every operations.
The ergonomic friction you're describing is real, and I wish the solution
were as simple as adding std::get specializations. The reason io_result
uses the tuple protocol the way it does is an MSVC code generation bug with
aggregate decomposition in coroutines. When a coroutine does structured
binding on a co_await result using a plain aggregate, MSVC produces
corrupted values. The library becomes completely unusable. Not degraded -
unusable.

The workaround is to force structured bindings through the tuple protocol
(get<>, tuple_size, tuple_element) instead of aggregate decomposition,
which routes MSVC through a different codegen path that works correctly.
This is documented in the commit history:

https://github.com/cppalliance/capy/commit/04d0dc196bb039f4fb54c3911fc11862c...

Structured bindings do work correctly: auto [ec, n] = co_await
read_some(...) produces the right values. The specific gap is std::tie,
because there's no std::get in namespace std - only ADL get in boost::capy.
That gap is narrow. The alternative - reverting to plain aggregates -
breaks the library on the most widely used C++ compiler on Windows.

This isn't the only MSVC coroutine codegen issue we've had to work around.
There's also a symmetric transfer use-after-free that crashes IOCP-based
code under load, confirmed unfixed on MSVC 19.44:

https://github.com/cppalliance/capy/issues/180

The io_result design strikes a balance between ergonomics and the library
working at all on MSVC. We're happy to revisit this as newer MSVC versions
ship. If Microsoft fixes the aggregate decomposition codegen, the
constraint lifts and we can reconsider the design. The workaround is
pragmatic, not permanent.

The remaining points are ones where the code already addresses the concern
raised. I've included links to the specific files and line numbers so you
can verify each one directly.

# 1. Comparison to asio / any_stream ownership
...
Their ownership semantic (construction by pointer is non-owning,
construction by pointer is owning) is very unintuitive.
Construction by lvalue ref is non-owning, by rvalue ref is owning
would make more sense to me.
What Examining The Code Would Have Revealed: The design rationale isn't
spelled out in the header comments, so it's not obvious from just reading
the constructors. The ownership convention follows C++ Core Guidelines R.3:
a raw pointer is non-owning. At the call site, any_read_stream(&sock) makes
non-ownership visible through the & operator. any_read_stream(socket{ioc})
makes ownership visible through the temporary. The constructors are here:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

The alternative you propose - lvalue-ref as non-owning, rvalue-ref as
owning - has a subtle problem you'd catch immediately given your template
expertise. In a template context, S&& is a forwarding reference, not an
rvalue reference, which creates deduction ambiguity. And any_stream(sock)
as an lvalue would be visually indistinguishable from a copy or move
construction. The current convention gives the caller clear visual
ownership signals at every call site.

# 5. capy's non dynamic buffers
...
However, it seems capy is copying `asios` buffer sequence without
reconsidering the following new circumstances
...
If I have a function accepting a `span<const_buffer>` and I have a
generic buffer sequence I could do this
What Examining The Code Would Have Revealed: Unless you happen to look at
the vtable signatures in the type-erased layer, it's easy to miss that Capy
already does exactly what you're proposing. The type-erased stream types
accept spans internally. Here's the any_read_stream vtable - notice the
third parameter to construct_awaitable is std::span<mutable_buffer const>:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

The templated surface exists for compile-time optimization, and the span
conversion happens at the type-erasure boundary. The architecture you
describe - concrete types at the boundary, templates above - is the
architecture Capy uses.

# 6. capy::any_executor
...
I don't understand why there's an `executor_ref` and an
`any_executor`.
What Examining The Code Would Have Revealed: The codebase has grown since
we last discussed this on Slack, and the performance rationale isn't
obvious from the headers alone. executor_ref is two pointers, trivially
copyable:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

any_executor uses shared_ptr with virtual dispatch:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

executor_ref is copied at 32+ production sites on per-operation hot paths:
every co_await this_coro::executor, every strand dispatch and post, every
when_all/when_any child launch, every delay suspension, every async_mutex
lock, every run boundary crossing. Each of those copies is a two-register
memcpy. Merging them into a single type backed by shared_ptr replaces every
one of those memcpys with an atomic reference count increment. On
architectures where atomics are expensive - ARM, NUMA - that's measurable
across 32+ sites per operation. You know from Cobalt what per-operation
overhead costs look like at scale. The split is the same trade-off you'd
make.

# 7. capy::execution_context
...
Next, the `capy::execution_context` is odd. It is only used by
`corosio`, however it is part of any `executor`.
What Examining The Code Would Have Revealed: This is spread across several
files that aren't obvious from the top-level headers, so it's easy to miss.
Capy's own thread_pool inherits directly from execution_context:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

Nine files in Capy reference it. It's not only used by Corosio.

That said, you're right that the constructor should be protected so
execution_context can't be default-constructed on its own. That's a good
API observation and we'll make that change. Thank you.

# 8. capy::IoAwaitable
...
Not every awaitable is asynchronous. That means that not every
awaitable will need to dispatch back through the executor. I think
restricting `co_await` to just `io_awaitables` is too restrictive.
What Examining The Code Would Have Revealed: The naming might suggest more
restriction than actually exists, which is understandable. The IoAwaitable
concept constrains the await_suspend signature, not the behavior. Here's
the entire concept definition - it's nine lines:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

A synchronous awaitable that returns true from await_ready never reaches
await_suspend at all. The extra io_env const* parameter is dead code in
that path. Cost: zero. And in task.hpp, the if constexpr branch shows how
it works:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

The concept requires only that the signature accepts the environment
pointer. It says nothing about whether the awaitable must use it, must be
asynchronous, or must dispatch through the executor.

# 9. capy::async_run
...
The `async_run` double invocation just looks weird. I get the
intention of setting the thread_local memory resource for when the
task is created. I do however think that this is a consequence of
`asio did it` design. This could be much more intuitive, if the
`corosio::io_context` API changed to this:
double res = ctx.run(main_task());
What Examining The Code Would Have Revealed: This is one of those things
that only becomes visible when you trace what run_async actually does
versus what ctx.run does - the names make them look interchangeable when
they're fundamentally different operations. run_async is a non-blocking
launch mechanism. It dispatches the task through the executor and returns:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

Your proposed ctx.run(main_task()) is a blocking call that fuses task
launch with event loop pumping. These are different operations. The
non-blocking design enables launching multiple independent task trees
before pumping:

    run_async(ex)(accept_connections());
    run_async(ex)(health_check_loop());
    run_async(ex)(metrics_reporter());
    ctx.run();

The blocking version already exists - it's run_blocking in the test
utilities:

https://github.com/cppalliance/capy/blob/9144290189fa149b27617c7d9a476c8fbff...

The "double invocation" you mention - that's the TLS frame allocation
mechanism. The run_async_wrapper constructor sets thread-local state before
the task argument is evaluated, exploiting C++17 postfix evaluation order
so that the task's coroutine frame is allocated with the correct memory
resource. Asio has no equivalent mechanism. This is novel to Capy.

To summarize: I'm grateful for the depth of your engagement here. On
continuation::next and execution_context's constructor, you're right and
we'll make those changes. On async_mutex naming, the ecosystem precedent is
strong but I'm open to discussion. On DynamicBuffer scope, that's a
legitimate conversation and your perspective as someone who's built Cobalt
matters. On the remaining points, I'd invite you to check the code at the
links above - the library addresses several of these concerns in ways that
aren't immediately visible from the API surface.

I'd love to see your analysis of Corosio when you have time. That's where
the platform engineering decisions live. Given our collaboration on
P4126R0, I know you understand the depth of what's involved in coroutine
library design at this level, and there aren't many people in the C++
ecosystem with your combination of coroutine expertise and cross-platform
I/O experience.

Thanks again for doing this the right way.

Vinnie

[boost] Re: [capy review] Klemens' Laundry list

Vinnie Falco