Capy: Request for endorsements
Hi. We are proposing a new library "Capy" as a candidate library and seeking endorsements for formal review: https://github.com/cppalliance/capy https://master.capy.cpp.al/capy/index.html This library provides facilities which use C++20 coroutines to perform I/O. It is not a networking library, yet it is the perfect foundation upon which networking libraries, or any libraries that perform I/O, may be built. It introduces concepts for representing buffers of data, and moving those buffers of data through processing pipelines driven entirely by C++20 coroutines. The design of the library is based on one simple observation from Peter Dimov: An API designed from the ground up to use C++20 coroutines can achieve performance and ergonomics which cannot otherwise be obtained. Capy is currently the foundation for another new library (not yet proposed) called "Corosio" https://github.com/cppalliance/corosio https://master.corosio.cpp.al/corosio/index.html Corosio is a complete portable networking library which is built on the facilities that Capy offers. We intend to propose this library soon as the successor to the immensely popular Boost.Asio library. Capy is also the foundation for yet another new library (not yet proposed) called "Http" https://github.com/cppalliance/http https://master.http.cpp.al/http/index.html The Http library uses Capy, yet does not use Corosio. That is because Http is "Sans/IO." It provides the algorithms and data structures which implement the HTTP protocol at a high level, while remaining agnostic to the particular network implementation. This is possible thanks to the powerful stream abstractions which Capy offers. Capy is also used by Beast2, which is yet another new library (not yet proposed) and also the name for a family of libraries: https://github.com/cppalliance/beast2 Beast2 uses the Capy, Corosio, and Http libraries to implement high-level HTTP servers written in a C++ version of Express JS routers. This is the successor to Boost.Beast (which will continue to be maintained independently ). The Beast2 family of libraries includes: Capy — The library currently being proposed, the foundation of I/O. Corosio — Coroutine-only portable networking. This is the successor to Boost.Asio. Http — Sans-I/O HTTP/1.1. This is a high-level library: servers, clients, Express JS middleware. Websocket — Sans-I/O Websocket. This is also a high-level library. Beast2 — High-level HTTP and WebSocket servers. Express.js-style routing Burl — High-level HTTP client. curl features, coroutine ergonomics of coroutines, and Python Requests inspired API Currently, the C++ Standard does not deliver facilities optimized for networking I/O. We believe that Capy should become a standard library component to fill this gap. Our first paper based on Capy, introduces the " IoAwaitable" family of concepts: https://github.com/cppalliance/wg21-papers/blob/master/source/d4003-io-await... The Beman Way We are bringing Capy to Boost because this is what Boost was created for. The project exists to incubate high-quality libraries destined for standardization. Beman Dawes founded Boost on the principle that the best path to the standard is through proven practice: build it, ship it, let users depend on it, learn from real-world feedback, then propose standardization. Smart pointers, regular expressions, filesystem, threading primitives—all followed this path from Boost to the standard library. Capy represents Boost returning to its role as a leader in C++ standardization efforts. The library addresses a real gap in C++26: there is no standard foundation optimized for coroutine-based I/O. Rather than waiting for a committee to design something in the abstract, or adapting networking to a framework built for different requirements, Capy takes the proven approach. It exists. It works. It powers real networking code today. Now it needs the scrutiny and refinement that only the Boost review process can provide. The Problem Capy Solves When an I/O operation completes, the operating system wakes up some thread, such as a completion port thread, an epoll reactor, or an io_uring worker. Without affinity tracking, your coroutine resumes on that arbitrary thread, forcing you to add synchronization everywhere or risk data races. This is the fundamental problem that coroutine-based networking must solve. Capy's answer is the IoAwaitable protocol. When you launch a coroutine with a designated executor, every child coroutine inherits that executor affinity automatically. Execution context flows forward through `co_await` chains, not backward through P3826 queries, to ensure every coroutine in the chain runs in the same context. When I/O completes on some OS thread, the IoAwaitable protocol ensures your coroutine resumes on its designated executor. The data flow is explicit and testable. There are no thread-local globals, no implicit context, no surprises. Capy's `task` type uses the compiler to enforce invariants. Cancellation follows the same forward-propagation model. Stop tokens flow forward from the launch of a coroutine chain alongside the execution context, to arrive at the platform API boundary, providing a uniform cancellation interface across all operations. Frame allocation is where coroutine overhead traditionally hurts performance. Capy addresses this with thread-local recycling pools that achieve zero steady-state allocations after warmup. The coroutine launch site controls allocation policy, enabling per-deployment customization: bounded pools for real-time systems, per-tenant budgets for multi-tenant servers, or tracking allocators for debugging. Buffer handling is essential for networking, and Asio's twenty five years of experience showed us how. Capy provides buffer sequence algorithms: think `std::ranges` but for buffers. These are the vocabulary types and operations that networking code needs: slicing, copying, concatenating, and iterating over discontiguous memory. One million scatter/gather buffers, if you will. The design is driven by real-world usage, not theoretical completeness. Capy is opinionated on the things that matter for I/O: - An executor model for coroutine affinity and completion dispatch - Stop token integration: uniform cancellation, always available - Allocator control over frame allocation with zero-overhead recycling - Forward propagation of the full context through every `co_await` - A `task` type that enforces the _IoAwaitable_ protocol at compile time - Composition primitives for launching and coordinating coroutines - A strand for safe concurrency without mutexes - Buffer sequences: `std::ranges` for untyped bytes - Type erasure by default: no combinatorial explosion of templates Proven Through Corosio Capy is not speculation. It powers Corosio, a coroutine-only networking library that we are developing together. Corosio provides real sockets, acceptors, TLS streams, timers, DNS resolution, and multiple implementations of SSL streams, all built on Capy's foundation. This is the successor library to the incredibly popular Boost.Asio. It demonstrates what networking could look like if designed for coroutines from the start rather than adapted from callback-based models. The standardization strategy follows from this layering. Capy is the foundation piece that belongs in the standard: executor model, task types, buffer algorithms, cancellation integration. These are stable abstractions that networking libraries can build upon. Corosio, the networking piece, can remain outside the standard to mitigate risk: sockets and protocol implementations typically experience difficulty achieving consensus even after years of committee attention. Corosio can mature externally where it can evolve based on user feedback, while Capy provides the stable foundation that the standard library lacks. Call to Action We are requesting endorsements from Boost members to proceed with formal review. If you believe that Boost should provide a foundation for coroutine-based I/O—and that proven practice is the right path to standardization, your endorsement would be welcomed. The team is also happy to receive feedback on the design, the implementation, and the documentation. The library is ready for serious evaluation. Clone it, build it, write code against it, and tell us what works and what does not. This is Boost doing what Boost does best: building the libraries that C++ needs, proving them in practice, and paving the way for standardization. Distributed under the Boost Software License, Version 1.0. (See accompanying file [LICENSE_1_0.txt](LICENSE_1_0.txt) or copy at https://www.boost.org/LICENSE_1_0.txt)
Does any of this have any production-facing usage at all? It's also worth asking, was any of this largely coded by something such as an LLM? - Christian
Christian Mazakas wrote:
Does any of this have any production-facing usage at all?
It's also worth asking, was any of this largely coded by something such as an LLM?
- Christian
Please observe https://original.boost.org/community/policy.html#quoting.
I don't feel that I have enough experience in the field to formally endorse, but I think that the whole chain of libraries is a good idea. I have followed some discussions around Asio, and how the interaction with Chris was, err, not so smooth. A few years ago I did some work on a distributed system. It was callbacks everywhere, and in Boomer C++ (c++03) to boot. As a result I exfiltrated myself, to infrastructure work in Python. Later that product replaced some of the callbacks with FSMs. One of my first advanced work in C (not++) was an implementation of coroutines using setjmp/longjmp (and UB :-D ), in the late 80s. Even back then, coroutines were an old idea. It baffles me that it took so much time for C++ to grow them. Especially considering that Stroustrup included a coroutine library in the early versions of C++ (IIRC). Working in Python reminded me how useful coroutines are. On Fri, Jan 30, 2026 at 9:28 AM Vinnie Falco via Boost <boost@lists.boost.org> wrote:
Hi.
We are proposing a new library "Capy" as a candidate library and seeking endorsements for formal review:
https://github.com/cppalliance/capy https://master.capy.cpp.al/capy/index.html
This library provides facilities which use C++20 coroutines to perform I/O. It is not a networking library, yet it is the perfect foundation upon which networking libraries, or any libraries that perform I/O, may be built. It introduces concepts for representing buffers of data, and moving those buffers of data through processing pipelines driven entirely by C++20 coroutines. The design of the library is based on one simple observation from Peter Dimov:
An API designed from the ground up to use C++20 coroutines can achieve performance and ergonomics which cannot otherwise be obtained.
Capy is currently the foundation for another new library (not yet proposed) called "Corosio"
https://github.com/cppalliance/corosio https://master.corosio.cpp.al/corosio/index.html
Corosio is a complete portable networking library which is built on the facilities that Capy offers. We intend to propose this library soon as the successor to the immensely popular Boost.Asio library.
Capy is also the foundation for yet another new library (not yet proposed) called "Http"
https://github.com/cppalliance/http
https://master.http.cpp.al/http/index.html
The Http library uses Capy, yet does not use Corosio. That is because Http is "Sans/IO." It provides the algorithms and data structures which implement the HTTP protocol at a high level, while remaining agnostic to the particular network implementation. This is possible thanks to the powerful stream abstractions which Capy offers.
Capy is also used by Beast2, which is yet another new library (not yet proposed) and also the name for a family of libraries:
https://github.com/cppalliance/beast2
Beast2 uses the Capy, Corosio, and Http libraries to implement high-level HTTP servers written in a C++ version of Express JS routers. This is the successor to Boost.Beast (which will continue to be maintained independently ).
The Beast2 family of libraries includes:
Capy — The library currently being proposed, the foundation of I/O. Corosio — Coroutine-only portable networking. This is the successor to Boost.Asio. Http — Sans-I/O HTTP/1.1. This is a high-level library: servers, clients, Express JS middleware. Websocket — Sans-I/O Websocket. This is also a high-level library. Beast2 — High-level HTTP and WebSocket servers. Express.js-style routing Burl — High-level HTTP client. curl features, coroutine ergonomics of coroutines, and Python Requests inspired API
Currently, the C++ Standard does not deliver facilities optimized for networking I/O. We believe that Capy should become a standard library component to fill this gap. Our first paper based on Capy, introduces the " IoAwaitable" family of concepts:
https://github.com/cppalliance/wg21-papers/blob/master/source/d4003-io-await...
The Beman Way
We are bringing Capy to Boost because this is what Boost was created for. The project exists to incubate high-quality libraries destined for standardization. Beman Dawes founded Boost on the principle that the best path to the standard is through proven practice: build it, ship it, let users depend on it, learn from real-world feedback, then propose standardization. Smart pointers, regular expressions, filesystem, threading primitives—all followed this path from Boost to the standard library.
Capy represents Boost returning to its role as a leader in C++ standardization efforts. The library addresses a real gap in C++26: there is no standard foundation optimized for coroutine-based I/O. Rather than waiting for a committee to design something in the abstract, or adapting networking to a framework built for different requirements, Capy takes the proven approach. It exists. It works. It powers real networking code today. Now it needs the scrutiny and refinement that only the Boost review process can provide.
The Problem Capy Solves
When an I/O operation completes, the operating system wakes up some thread, such as a completion port thread, an epoll reactor, or an io_uring worker. Without affinity tracking, your coroutine resumes on that arbitrary thread, forcing you to add synchronization everywhere or risk data races. This is the fundamental problem that coroutine-based networking must solve.
Capy's answer is the IoAwaitable protocol. When you launch a coroutine with a designated executor, every child coroutine inherits that executor affinity automatically. Execution context flows forward through `co_await` chains, not backward through P3826 queries, to ensure every coroutine in the chain runs in the same context. When I/O completes on some OS thread, the IoAwaitable protocol ensures your coroutine resumes on its designated executor. The data flow is explicit and testable. There are no thread-local globals, no implicit context, no surprises. Capy's `task` type uses the compiler to enforce invariants.
Cancellation follows the same forward-propagation model. Stop tokens flow forward from the launch of a coroutine chain alongside the execution context, to arrive at the platform API boundary, providing a uniform cancellation interface across all operations.
Frame allocation is where coroutine overhead traditionally hurts performance. Capy addresses this with thread-local recycling pools that achieve zero steady-state allocations after warmup. The coroutine launch site controls allocation policy, enabling per-deployment customization: bounded pools for real-time systems, per-tenant budgets for multi-tenant servers, or tracking allocators for debugging.
Buffer handling is essential for networking, and Asio's twenty five years of experience showed us how. Capy provides buffer sequence algorithms: think `std::ranges` but for buffers. These are the vocabulary types and operations that networking code needs: slicing, copying, concatenating, and iterating over discontiguous memory. One million scatter/gather buffers, if you will. The design is driven by real-world usage, not theoretical completeness.
Capy is opinionated on the things that matter for I/O:
- An executor model for coroutine affinity and completion dispatch - Stop token integration: uniform cancellation, always available - Allocator control over frame allocation with zero-overhead recycling - Forward propagation of the full context through every `co_await` - A `task` type that enforces the _IoAwaitable_ protocol at compile time - Composition primitives for launching and coordinating coroutines - A strand for safe concurrency without mutexes - Buffer sequences: `std::ranges` for untyped bytes - Type erasure by default: no combinatorial explosion of templates
Proven Through Corosio
Capy is not speculation. It powers Corosio, a coroutine-only networking library that we are developing together. Corosio provides real sockets, acceptors, TLS streams, timers, DNS resolution, and multiple implementations of SSL streams, all built on Capy's foundation. This is the successor library to the incredibly popular Boost.Asio. It demonstrates what networking could look like if designed for coroutines from the start rather than adapted from callback-based models.
The standardization strategy follows from this layering. Capy is the foundation piece that belongs in the standard: executor model, task types, buffer algorithms, cancellation integration. These are stable abstractions that networking libraries can build upon. Corosio, the networking piece, can remain outside the standard to mitigate risk: sockets and protocol implementations typically experience difficulty achieving consensus even after years of committee attention. Corosio can mature externally where it can evolve based on user feedback, while Capy provides the stable foundation that the standard library lacks.
Call to Action
We are requesting endorsements from Boost members to proceed with formal review. If you believe that Boost should provide a foundation for coroutine-based I/O—and that proven practice is the right path to standardization, your endorsement would be welcomed.
The team is also happy to receive feedback on the design, the implementation, and the documentation. The library is ready for serious evaluation. Clone it, build it, write code against it, and tell us what works and what does not.
This is Boost doing what Boost does best: building the libraries that C++ needs, proving them in practice, and paving the way for standardization.
Distributed under the Boost Software License, Version 1.0. (See accompanying file [LICENSE_1_0.txt](LICENSE_1_0.txt) or copy at https://www.boost.org/LICENSE_1_0.txt) _______________________________________________ Boost mailing list -- boost@lists.boost.org To unsubscribe send an email to boost-leave@lists.boost.org https://lists.boost.org/mailman3/lists/boost.lists.boost.org/ Archived at: https://lists.boost.org/archives/list/boost@lists.boost.org/message/5LWFGERI...
On Fri, 30 Jan 2026 at 15:29, Vinnie Falco via Boost <boost@lists.boost.org> wrote:
This library provides facilities which use C++20 coroutines to perform I/O. It is not a networking library, yet it is the perfect foundation upon which networking libraries, or any libraries that perform I/O, may be built.
I think this is a sound approach. That said, as a potential user, I've got a question:
Capy is also the foundation for yet another new library (not yet proposed) called "Http"
https://github.com/cppalliance/http
https://master.http.cpp.al/http/index.html
The Http library uses Capy, yet does not use Corosio. That is because Http is "Sans/IO."
When we say sans-io, it usually means that it can be used regardless of the I/O framework and completion paradigm you're using (think of OpenSSL). Can I implement an Asio universal async operation using Capy? I got the impression that I can't, however it should be possible if the library is truely sans-io. The reason I'm asking is because I'm writing a PostgreSQL library that targets C++20 and above, hence your proposed library is an option. I intend to expose a sans-io API. This API needs to power both a Corosio connection and an Asio connection (at the end of the day, I want users, and today users do Asio). Regards, Rubén.
On Wed, Feb 4, 2026 at 2:26 PM Ruben Perez <rubenperez038@gmail.com> wrote:
When we say sans-io, it usually means that it can be used regardless of the I/O framework and completion paradigm you're using (think of OpenSSL). Can I implement an Asio universal async operation using Capy? I got the impression that I can't, however it should be possible if the library is truely sans-io.
What you are asking for, "implement an Asio universal async operation using Capy" is not possible, but for pedantic reasons which we don't need to get into here. What I think you are really asking for is a buffer-oriented, type-erased stream upon which you can implement algorithms which transact in buffers of inputs and outputs. Many protocols can be implemented with this primitive: HTTP, Websocket, MySQL, ZLib, JSON serialization and deserialization, and a lot more.
The reason I'm asking is because I'm writing a PostgreSQL library that targets C++20 and above, hence your proposed library is an option. I intend to expose a sans-io API. This API needs to power both a Corosio connection and an Asio connection (at the end of the day, I want users, and today users do Asio).
I will list some facts and then we can see where they go. * Capy's buffer sequences are compatible with Asio's. You can pass each where the other is expected. Yet Capy does not include asio headers. * capy::any_stream is a concrete type which can be constructed by value or by reference from any object that satisfies capy::Stream: https://github.com/cppalliance/capy/blob/18aa8a8b5fe8d0925e983907d057a1111c4... https://github.com/cppalliance/capy/blob/18aa8a8b5fe8d0925e983907d057a1111c4... * values of types which satisfy capy::Executor may be used to construct capy::executor_ref, a type-erasing reference wrapper which does not allocate and is two pointers in size: https://github.com/cppalliance/capy/blob/18aa8a8b5fe8d0925e983907d057a1111c4... https://github.com/cppalliance/capy/blob/18aa8a8b5fe8d0925e983907d057a1111c4... * asio_executor adapts asio::io_context::executor_type to satisfy capy::Executor: https://github.com/vinniefalco/capy/blob/e89bbdd39d3de00d257b91d4c0264276bd5... * asio_context hides an asio::io_context in a pimpl, and offers a type-erased asio_executor (via get_executor): https://github.com/vinniefalco/capy/blob/e89bbdd39d3de00d257b91d4c0264276bd5... * asio_socket adapts an asio::ip::tcp::socket to satisfy capy::Stream https://github.com/vinniefalco/capy/blob/e89bbdd39d3de00d257b91d4c0264276bd5... Some notes on this adaptor: 1. It is coroutine-only (read_some and write_some return awaitables) 2. It operates on any BufferSequence (the sequence is unrolled up to 8 elements, which can be adjusted) 3. It connects the IoAwaitable protocol std::stop_token to an asio cancellation slot * The private function make_socket_pair returns two connected asio::ip::tcp::socket: https://github.com/vinniefalco/capy/blob/e89bbdd39d3de00d257b91d4c0264276bd5... * The function make_stream_pair returns two connected capy::any_stream: https://github.com/vinniefalco/capy/blob/e89bbdd39d3de00d257b91d4c0264276bd5... * The function main (in any_stream.cpp) creates an asio_context, two connected capy::any_stream, and launches two coroutines which transfer data: https://github.com/vinniefalco/capy/blob/e89bbdd39d3de00d257b91d4c0264276bd5... https://github.com/vinniefalco/capy/blob/e89bbdd39d3de00d257b91d4c0264276bd5... It was a dense read, and I think it is worth it; this is the big reveal if you haven't figured it out already: * This runs on Asio, yet no Asio includes are publicly visible. This example shows that it is both possible and easy to wrap asio's objects in a way that satisfies Capy's requirements, to build stream abstractions which do not leak types yet remain entirely functional. Would this be something upon which you can implement most of the requirements of PostgresSQL? The promptogorov complexity of these wrappers is so low that variations can be produced on demand by frontier models. Also note, that it is not required to use the type-erasing wrappers. You can express algorithms as templates if you like (asio style). For example capy::read_until is expressed thusly: https://github.com/cppalliance/capy/blob/18aa8a8b5fe8d0925e983907d057a1111c4... You can pass capy::any_stream here, or you could pass asio_socket if you want. Both work. Thanks
On Wed, Feb 4, 2026 at 2:26 PM Ruben Perez <rubenperez038@gmail.com> wrote:
Can I implement an Asio universal async operation using Capy?
We had a conversation today in the Official C++ Language Slack Workspace and this clarified my understanding of what you are asking for. You want: asynchronous initiating functions which have Asio's Universal Asynchronous Model [1] which drive Capy's type-erased streams. This allows you to implement business logic (for example, a PostgresSQL client implementation) as "sans-I/O." That is, agnostic to the model of asynchrony. While allowing existing users who are already Asio-based, to interact with your algorithms - this provides a clear migration path. Otherwise known as "brownfield development." Yes this is possible. The example that I provided previously, offers Capy's execution model wrapped around an invisible Asio socket. I have added a new example which wraps Asio's universal asynchronous model around a type-erased Capy stream: * `uni_stream` models Asio's AsyncReadStream and AsyncWriteStream and uses completion tokens: https://github.com/cppalliance/capy/blob/4c4fa32cb96865eda059d1cb559f928624c... * `make_uni_pair` returns two connected `uni_stream` which type-erase an underlying `asio::ip::tcp::socket`: https://github.com/cppalliance/capy/blob/4c4fa32cb96865eda059d1cb559f928624c... * The asio_callbacks.cpp demo shows how asio-style completion handlers can drive the `uni_stream`: https://github.com/cppalliance/capy/blob/4c4fa32cb96865eda059d1cb559f928624c... https://github.com/cppalliance/capy/blob/4c4fa32cb96865eda059d1cb559f928624c... Note how the demo uses `asio::read` and `asio::write` free functions, which are composed operations built on top of `uni_stream` - this demonstrates that it works as expected. On the other side of the `uni_stream` (the thing erased by the `any_stream`) sits an asio::ip::tcp::socket. For your use_case, you might have something like this: // Postgres client with Asio interface class uni_postgres_client { postgres_client cli_; public: template< class CompletionToken > auto f( CompletionToken&& ); } // Postgress client with Capy interface, implemented sans-I/O class postgres_client { capy::any_stream stream_; public: capy::task f(); }; postgres_client::f is an ordinary coroutine function (goes in the cpp file) which implements an algorithm in terms of capy::any_stream. In the first iteration of your library, users will interact with uni_postgres_client. In future versions new users or migrating users would interact with postgres_client directly (both work). This allows them to upgrade little by little. Most of your algorithms can be expressed in terms of capy::any_stream, but there are some things which cannot. Resolving DNS names, establishing outgoing connections, or accepting incoming connections are not operations reflected by Capy concepts. Your uni_postgres_client would need to have some pure virtual member functions to do these things (they can be capy::task coroutines of course), and a derived class would have to implement some I/O specific things. If your underlying stream is asio socket then you would write asio code to connect or resolve names. If your underlying stream is Corosio, then you would use Corosio's APIs to do those things. Happy to elaborate further if there are specific questions. Thanks
Vinnie, (and others) I've been evaluating boost::cobalt for the purpose of network service implementation in C++23. The coroutine and ASIO world I'm not so familiar with, but I've definitely gotten an intuitive sense of the applicability to this problem domain. "Wait for various things efficiently in a cancellable manner". I've had a look around for discussion of cobalt vs capy, perhaps some mention of that would also be helpful to others. Claude AI doesn't seem aware of Capy (yet) either. In a nushell, when would cobalt or capy be more applicable? Thanks, Nigel Stewart
On Sat, Feb 7, 2026 at 10:53 PM Nigel Stewart via Boost < boost@lists.boost.org> wrote:
I've had a look around for discussion of cobalt vs capy, perhaps some mention of that would also be helpful to others. Claude AI doesn't seem aware of Capy (yet) either.
In a nushell, when would cobalt or capy be more applicable?
"Wait for various things efficiently in a cancelable manner" is a surprisingly succinct description of the problem domain. I have an analysis of Capy versus Cobalt. Note however that I am not an expert in Cobalt, if there are inaccuracies I would very much welcome any corrections: https://master.capy.cpp.al/capy/why-not-cobalt.html Thanks
Nigel Stewart via Boost <boost@lists.boost.org> spake the secret code <CAOD5mAFjSC=tv9uEZir4Ky+tKv0-ZMYQG_VHUyYdcv8--4MNYw@mail.gmail.com> thusly:
I've been evaluating boost::cobalt for the purpose of network service implementation in C++23. [...]
FYI, I gave two presentations on coroutines: one "raw" and one with boost.cobalt. You may find them useful: Introduction to Coroutines <https://www.youtube.com/watch?v=p2UBok-rTrg> Using Coroutines with Boost.Cobalt <https://www.youtube.com/watch?v=yElSdUqEvME> -- "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline> The Terminals Wiki <http://terminals-wiki.org> The Computer Graphics Museum <http://computergraphicsmuseum.org> Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>
On Thu, 5 Feb 2026 at 18:48, Vinnie Falco <vinnie.falco@gmail.com> wrote:
On Wed, Feb 4, 2026 at 2:26 PM Ruben Perez <rubenperez038@gmail.com> wrote:
Can I implement an Asio universal async operation using Capy?
We had a conversation today in the Official C++ Language Slack Workspace and this clarified my understanding of what you are asking for.
You want: asynchronous initiating functions which have Asio's Universal Asynchronous Model [1] which drive Capy's type-erased streams. This allows you to implement business logic (for example, a PostgresSQL client implementation) as "sans-I/O." That is, agnostic to the model of asynchrony. While allowing existing users who are already Asio-based, to interact with your algorithms - this provides a clear migration path. Otherwise known as "brownfield development."
Yes this is possible. The example that I provided previously, offers Capy's execution model wrapped around an invisible Asio socket. I have added a new example which wraps Asio's universal asynchronous model around a type-erased Capy stream:
* `uni_stream` models Asio's AsyncReadStream and AsyncWriteStream and uses completion tokens:
https://github.com/cppalliance/capy/blob/4c4fa32cb96865eda059d1cb559f928624c...
This does not handle cancellation. I don't think that trying to write Asio universal async operations using Capy is a good idea. I would encourage you to remove the "sans-io" term from Capy, since it assumes a particular I/O model.
Most of your algorithms can be expressed in terms of capy::any_stream, but there are some things which cannot. Resolving DNS names, establishing outgoing connections, or accepting incoming connections are not operations reflected by Capy concepts. Your uni_postgres_client would need to have some pure virtual member functions to do these things (they can be capy::task coroutines of course), and a derived class would have to implement some I/O specific things. If your underlying stream is asio socket then you would write asio code to connect or resolve names. If your underlying stream is Corosio, then you would use Corosio's APIs to do those things.
Is there a reason why reading and writing have been placed in Capy, while other operations have been placed in Corosio? Connection establishment is a need for all clients, regardless of what the service they connect to.
Happy to elaborate further if there are specific questions.
If I happen to write a library exposing a capy API like this: class pg_client { public: capy::io_task<void> query(string_view q); }; Let's say I have a user that wants to adopt my library. The user has the following code already working in production: asio::awaitable<void> co_main() { // Some code that uses Asio, MySQL and MQTT5 } int main() { asio::io_context ctx; asio::co_spawn(ctx, co_main(), asio::detached); ctx.run(); } What's the best way for this user to adopt my library? Do they need to port everything in co_main to capy and corosio? Or can they keep it and use my library?
On Mon, Feb 9, 2026 at 3:19 AM Ruben Perez <rubenperez038@gmail.com> wrote:
This does not handle cancellation.
You're right, it doesn't yet. The example was written to demonstrate the integration pattern, not to be production-complete. Adding cancellation support is straightforward and I'll update the example.
I don't think that trying to write Asio universal async operations using Capy is a good idea. I would encourage you to remove the "sans-io" term from Capy, since it assumes a particular I/O model.
I respectfully disagree. The term "sans-I/O" means that your protocol or business logic doesn't perform I/O itself; it reads and writes bytes through an abstraction, without knowing what the transport is. A Capy algorithm written against `any_stream` has no idea whether it's talking to a TCP socket, a TLS channel, a Unix pipe, or a memory buffer in a unit test. That's sans-I/O. Every sans-I/O design has an interface shape. Even the purest sans-I/O libraries define how bytes flow in and out. Capy's shape is coroutine-based type-erased streams. Having a shape doesn't disqualify it from being sans-I/O, it's what makes it usable. I think the concern here may be conflating "sans-I/O" with "sans-async-model," and those are different things. Sans-I/O means your business logic doesn't know what the transport is. It doesn't mean your business logic has no opinion about how async execution works. Asio's universal async model is also a particular model. Writing algorithms against `AsyncReadStream` with completion tokens is no more or less sans-I/O than writing against `any_stream` with coroutines. Both decouple the algorithm from the transport. Both commit to an execution pattern. The practical litmus test for sans-I/O is testability: can you exercise your protocol logic entirely in-process, with no sockets and no event loop? With Capy's memory-backed streams, you can. That's the proof.
Is there a reason why reading and writing have been placed in Capy, while other operations have been placed in Corosio? Connection establishment is a need for all clients, regardless of what the service they connect to.
There's a broad spectrum of algorithms whose entire job is moving bytes through a stream, and they have nothing to do with how things get connected. JSON parsing and serialization. Compression and decompression. TLS; once negotiated, it's just reading and writing encrypted bytes. HTTP message framing, WebSocket framing, protocol wire formats like PostgreSQL or Redis. Hashing, checksumming, base64 encoding, proxy forwarding, rate limiting, multiplexing. The list goes on. These algorithms are portable, testable, and reusable precisely because they only care about bytes in and bytes out. Connection establishment, DNS resolution, and accepting incoming connections are inherently platform-specific. They depend on the operating system, the network stack, and the I/O driver. Capy draws the line exactly where it makes sense: stream I/O is the reusable, portable layer. Connecting is platform-specific and belongs in Corosio. This is the same separation you see in well-designed systems everywhere. A JSON parser doesn't need to know how you connected to the server. A TLS implementation doesn't care whether the underlying transport is a TCP socket or a named pipe. Capy formalizes that boundary.
What's the best way for this user to adopt my library? Do they need to port everything in co_main to capy and corosio? Or can they keep it and use my library?
The answer depends on what API the library author chooses to expose, and it's entirely their decision. If the library author exposes `capy::io_task<void>` as the return type, then an existing Asio user can't directly `co_await` it from an `asio::awaitable` coroutine. Asio's `awaitable` promise type defines a closed set of `await_transform` overloads; it only accepts `awaitable<T>`, Asio async operations, and a few internal primitives. A `capy::io_task` is none of those, so the compiler rejects it. This is a limitation of Asio's coroutine integration, not Capy's, but it's a real practical constraint. However, the library author doesn't have to expose a Capy API. The more interesting design is a two-layer approach: The internal layer contains all the protocol logic: parsing, serialization, state management, written against `capy::any_stream` using `capy::task` coroutines. This is the sans-I/O core. It's portable, testable with memory-backed streams, and knows nothing about Asio. The public layer is a thin shell that exposes whatever API the library author wants. If they want to serve existing Asio users, they return `asio::awaitable<void>` and use `asio::async_initiate` internally to bridge into the Capy core. The user's code looks like this: asio::awaitable<void> co_main() { pg_client client; co_await client.query("SELECT 1"); // works, it's an Asio operation } The user doesn't port anything. They don't even know Capy exists. The bridging is internal to the library, invisible at the API boundary. This is the same pattern as `asio::ssl::stream`; OpenSSL is a completely different world internally, but Asio wraps it and the user never touches OpenSSL directly. The library author can also expose a native Capy API alongside the Asio one, for greenfield users who want the full benefits of Capy's execution model. Same protocol core, two thin shells. The sans-I/O design is what makes this possible as the protocol logic is written once and reused regardless of which async model the consumer prefers. Thanks
Dear esteemed colleagues, I have significant domain knowledge of asynchronous systems, ASIO, Cobalt and IO. I had a hand in some of the design decisions in Cobalt that were informed by aggressive early adoption of stackless C++20 coroutines in production code. Probably too early... I have read the mailing list and noted the concerns, which I think are orthogonal to the question. Use of LLMs: irrelevant. Code will be peer reviewed and tested. In the very near future, no code will be full hand written. Licencing: I know Vinnie well. The code will be meticulously BSL without doubt. Underlying principle: "By their fruits ye shall know them". I believe the Capy project is a sincere attempt to match C++20 coroutines with the real world. Despite the potential conflict with Cobalt (of whom the author is a respected friend of mine), I believe that if the state of the art is to progress - and C++20 coroutines are to become commonly used in production code - the Capy endeavour and its derivatives are exactly the kind of libraries that Boost was built for. I have a personal interest in seeing a credible and coherent IO/coroutine ecosystem in C++. I run a trading business and would like to use it! The cost to me of developing such a thing for myself is beyond prohibitive. Settling for a substandard but commonly understood solution is my only sane choice at present. If I and the wider world are gifted a superior product simply by saying yes, we'd be beyond foolish not to. *I endorse the request for formal review.* Sincerely, Richard On Fri, 30 Jan 2026 at 15:27, Vinnie Falco via Boost <boost@lists.boost.org> wrote:
Hi.
We are proposing a new library "Capy" as a candidate library and seeking endorsements for formal review:
https://github.com/cppalliance/capy https://master.capy.cpp.al/capy/index.html
This library provides facilities which use C++20 coroutines to perform I/O. It is not a networking library, yet it is the perfect foundation upon which networking libraries, or any libraries that perform I/O, may be built. It introduces concepts for representing buffers of data, and moving those buffers of data through processing pipelines driven entirely by C++20 coroutines. The design of the library is based on one simple observation from Peter Dimov:
An API designed from the ground up to use C++20 coroutines can achieve performance and ergonomics which cannot otherwise be obtained.
Capy is currently the foundation for another new library (not yet proposed) called "Corosio"
https://github.com/cppalliance/corosio https://master.corosio.cpp.al/corosio/index.html
Corosio is a complete portable networking library which is built on the facilities that Capy offers. We intend to propose this library soon as the successor to the immensely popular Boost.Asio library.
Capy is also the foundation for yet another new library (not yet proposed) called "Http"
https://github.com/cppalliance/http
https://master.http.cpp.al/http/index.html
The Http library uses Capy, yet does not use Corosio. That is because Http is "Sans/IO." It provides the algorithms and data structures which implement the HTTP protocol at a high level, while remaining agnostic to the particular network implementation. This is possible thanks to the powerful stream abstractions which Capy offers.
Capy is also used by Beast2, which is yet another new library (not yet proposed) and also the name for a family of libraries:
https://github.com/cppalliance/beast2
Beast2 uses the Capy, Corosio, and Http libraries to implement high-level HTTP servers written in a C++ version of Express JS routers. This is the successor to Boost.Beast (which will continue to be maintained independently ).
The Beast2 family of libraries includes:
Capy — The library currently being proposed, the foundation of I/O. Corosio — Coroutine-only portable networking. This is the successor to Boost.Asio. Http — Sans-I/O HTTP/1.1. This is a high-level library: servers, clients, Express JS middleware. Websocket — Sans-I/O Websocket. This is also a high-level library. Beast2 — High-level HTTP and WebSocket servers. Express.js-style routing Burl — High-level HTTP client. curl features, coroutine ergonomics of coroutines, and Python Requests inspired API
Currently, the C++ Standard does not deliver facilities optimized for networking I/O. We believe that Capy should become a standard library component to fill this gap. Our first paper based on Capy, introduces the " IoAwaitable" family of concepts:
https://github.com/cppalliance/wg21-papers/blob/master/source/d4003-io-await...
The Beman Way
We are bringing Capy to Boost because this is what Boost was created for. The project exists to incubate high-quality libraries destined for standardization. Beman Dawes founded Boost on the principle that the best path to the standard is through proven practice: build it, ship it, let users depend on it, learn from real-world feedback, then propose standardization. Smart pointers, regular expressions, filesystem, threading primitives—all followed this path from Boost to the standard library.
Capy represents Boost returning to its role as a leader in C++ standardization efforts. The library addresses a real gap in C++26: there is no standard foundation optimized for coroutine-based I/O. Rather than waiting for a committee to design something in the abstract, or adapting networking to a framework built for different requirements, Capy takes the proven approach. It exists. It works. It powers real networking code today. Now it needs the scrutiny and refinement that only the Boost review process can provide.
The Problem Capy Solves
When an I/O operation completes, the operating system wakes up some thread, such as a completion port thread, an epoll reactor, or an io_uring worker. Without affinity tracking, your coroutine resumes on that arbitrary thread, forcing you to add synchronization everywhere or risk data races. This is the fundamental problem that coroutine-based networking must solve.
Capy's answer is the IoAwaitable protocol. When you launch a coroutine with a designated executor, every child coroutine inherits that executor affinity automatically. Execution context flows forward through `co_await` chains, not backward through P3826 queries, to ensure every coroutine in the chain runs in the same context. When I/O completes on some OS thread, the IoAwaitable protocol ensures your coroutine resumes on its designated executor. The data flow is explicit and testable. There are no thread-local globals, no implicit context, no surprises. Capy's `task` type uses the compiler to enforce invariants.
Cancellation follows the same forward-propagation model. Stop tokens flow forward from the launch of a coroutine chain alongside the execution context, to arrive at the platform API boundary, providing a uniform cancellation interface across all operations.
Frame allocation is where coroutine overhead traditionally hurts performance. Capy addresses this with thread-local recycling pools that achieve zero steady-state allocations after warmup. The coroutine launch site controls allocation policy, enabling per-deployment customization: bounded pools for real-time systems, per-tenant budgets for multi-tenant servers, or tracking allocators for debugging.
Buffer handling is essential for networking, and Asio's twenty five years of experience showed us how. Capy provides buffer sequence algorithms: think `std::ranges` but for buffers. These are the vocabulary types and operations that networking code needs: slicing, copying, concatenating, and iterating over discontiguous memory. One million scatter/gather buffers, if you will. The design is driven by real-world usage, not theoretical completeness.
Capy is opinionated on the things that matter for I/O:
- An executor model for coroutine affinity and completion dispatch - Stop token integration: uniform cancellation, always available - Allocator control over frame allocation with zero-overhead recycling - Forward propagation of the full context through every `co_await` - A `task` type that enforces the _IoAwaitable_ protocol at compile time - Composition primitives for launching and coordinating coroutines - A strand for safe concurrency without mutexes - Buffer sequences: `std::ranges` for untyped bytes - Type erasure by default: no combinatorial explosion of templates
Proven Through Corosio
Capy is not speculation. It powers Corosio, a coroutine-only networking library that we are developing together. Corosio provides real sockets, acceptors, TLS streams, timers, DNS resolution, and multiple implementations of SSL streams, all built on Capy's foundation. This is the successor library to the incredibly popular Boost.Asio. It demonstrates what networking could look like if designed for coroutines from the start rather than adapted from callback-based models.
The standardization strategy follows from this layering. Capy is the foundation piece that belongs in the standard: executor model, task types, buffer algorithms, cancellation integration. These are stable abstractions that networking libraries can build upon. Corosio, the networking piece, can remain outside the standard to mitigate risk: sockets and protocol implementations typically experience difficulty achieving consensus even after years of committee attention. Corosio can mature externally where it can evolve based on user feedback, while Capy provides the stable foundation that the standard library lacks.
Call to Action
We are requesting endorsements from Boost members to proceed with formal review. If you believe that Boost should provide a foundation for coroutine-based I/O—and that proven practice is the right path to standardization, your endorsement would be welcomed.
The team is also happy to receive feedback on the design, the implementation, and the documentation. The library is ready for serious evaluation. Clone it, build it, write code against it, and tell us what works and what does not.
This is Boost doing what Boost does best: building the libraries that C++ needs, proving them in practice, and paving the way for standardization.
Distributed under the Boost Software License, Version 1.0. (See accompanying file [LICENSE_1_0.txt](LICENSE_1_0.txt) or copy at https://www.boost.org/LICENSE_1_0.txt) _______________________________________________ Boost mailing list -- boost@lists.boost.org To unsubscribe send an email to boost-leave@lists.boost.org https://lists.boost.org/mailman3/lists/boost.lists.boost.org/ Archived at: https://lists.boost.org/archives/list/boost@lists.boost.org/message/5LWFGERI...
Hi. I've been listening to the feedback on this thread, and it's changed my thinking about how to present these libraries for review. I'm going to submit Capy and Corosio together as a single review, even though they are separate libraries. Here's why. Capy is a foundational library. It provides the abstractions (tasks, buffers, stream concepts, executors, the IoAwaitable protocol) but it doesn't do networking by itself. Asking people to evaluate Capy in isolation is like asking them to evaluate the STL's iterator concepts without showing them a single container or algorithm. The concepts are correct, but the value is invisible until you see what gets built with them. I wrote about this general problem recently: https://www.vinniefalco.com/p/the-expertise-gap The short version: you can't value the solution to a problem you've never encountered. Capy solves Boost-grade problems, coroutine frame lifetimes, cancellation propagation through deep call chains, allocation strategies for high-throughput servers. These are real problems, but they're invisible to anyone who hasn't hit them in production. Without Corosio demonstrating the payoff, the endorsement request was asking people to take the library's value on faith. That's not fair to reviewers and it's not a good use of anyone's time. The thread itself proved this. Ion observed that "it's more difficult to see the usefulness of this framework without some concrete uses." Ruben's questions about sans-I/O and Asio interop required detailed Corosio examples to answer properly. Nigel asked about Cobalt versus Capy, and the comparison only makes sense when you can see what Capy enables at the networking layer. Every substantive question on this thread pulled Corosio into the conversation. So: Capy and Corosio, together. Two libraries, one review. They remain physically separate, and the reasons for that separation are architectural. Capy is pure C++20 with zero platform-specific code. Corosio contains four platform backends (IOCP, epoll, kqueue, select). Libraries like Http depend on Capy but not Corosio. Merging them would force every consumer of buffer algorithms to link against platform networking they never touch. I wrote up the full reasoning here: https://www.vinniefalco.com/p/why-capy-is-separate This isn't unprecedented in spirit. Boost.Serialization is already split directory-wise into a core and individual archives. Boost.Range would have benefited from a similar physical split. A foundational library submitted alongside its primary consumer is a natural pairing. Richard Hodges endorsed the request and I'm grateful for that. His experience with Cobalt and production coroutine systems gives that endorsement real weight. Jean-Louis noted the value of the whole chain of libraries, which is exactly the perspective that a joint submission makes possible. I'm updating the endorsement request accordingly. If you believe Boost should provide a coroutine-based I/O foundation and a networking library built on top of it, your endorsement for a joint Capy + Corosio review would be welcome. To put it in even more simple terms: Corosio and Capy together is Asio2. https://github.com/cppalliance/capy https://master.capy.cpp.al/capy/index.html https://github.com/cppalliance/corosio https://master.corosio.cpp.al/corosio/index.html Thanks
Makes a lot of sense. Endorsed. On Fri, 13 Feb 2026 at 16:06, Vinnie Falco <vinnie.falco@gmail.com> wrote:
Hi.
I've been listening to the feedback on this thread, and it's changed my thinking about how to present these libraries for review. I'm going to submit Capy and Corosio together as a single review, even though they are separate libraries.
Here's why.
Capy is a foundational library. It provides the abstractions (tasks, buffers, stream concepts, executors, the IoAwaitable protocol) but it doesn't do networking by itself. Asking people to evaluate Capy in isolation is like asking them to evaluate the STL's iterator concepts without showing them a single container or algorithm. The concepts are correct, but the value is invisible until you see what gets built with them.
I wrote about this general problem recently:
https://www.vinniefalco.com/p/the-expertise-gap
The short version: you can't value the solution to a problem you've never encountered. Capy solves Boost-grade problems, coroutine frame lifetimes, cancellation propagation through deep call chains, allocation strategies for high-throughput servers. These are real problems, but they're invisible to anyone who hasn't hit them in production. Without Corosio demonstrating the payoff, the endorsement request was asking people to take the library's value on faith. That's not fair to reviewers and it's not a good use of anyone's time.
The thread itself proved this. Ion observed that "it's more difficult to see the usefulness of this framework without some concrete uses." Ruben's questions about sans-I/O and Asio interop required detailed Corosio examples to answer properly. Nigel asked about Cobalt versus Capy, and the comparison only makes sense when you can see what Capy enables at the networking layer. Every substantive question on this thread pulled Corosio into the conversation.
So: Capy and Corosio, together. Two libraries, one review.
They remain physically separate, and the reasons for that separation are architectural. Capy is pure C++20 with zero platform-specific code. Corosio contains four platform backends (IOCP, epoll, kqueue, select). Libraries like Http depend on Capy but not Corosio. Merging them would force every consumer of buffer algorithms to link against platform networking they never touch. I wrote up the full reasoning here:
https://www.vinniefalco.com/p/why-capy-is-separate
This isn't unprecedented in spirit. Boost.Serialization is already split directory-wise into a core and individual archives. Boost.Range would have benefited from a similar physical split. A foundational library submitted alongside its primary consumer is a natural pairing.
Richard Hodges endorsed the request and I'm grateful for that. His experience with Cobalt and production coroutine systems gives that endorsement real weight. Jean-Louis noted the value of the whole chain of libraries, which is exactly the perspective that a joint submission makes possible.
I'm updating the endorsement request accordingly. If you believe Boost should provide a coroutine-based I/O foundation and a networking library built on top of it, your endorsement for a joint Capy + Corosio review would be welcome.
To put it in even more simple terms: Corosio and Capy together is Asio2.
https://github.com/cppalliance/capy https://master.capy.cpp.al/capy/index.html
https://github.com/cppalliance/corosio https://master.corosio.cpp.al/corosio/index.html
Thanks
Capy and Corosio together as a single review, even though they are separate libraries.
Makes a lot of sense.
Endorsed.
For everyone's awareness, I chatted briefly with Vinnie about the mechanics of this review. Since it is fundamentally two libraries, and there have been comments about the time available for scrutiny, we are going to schedule this review for 15-20 days instead of the usual 10. Hopefully extended time mitigates some of these concerns. Matt
On 17 Feb 2026 16:30, Matt Borland via Boost wrote:
Capy and Corosio together as a single review, even though they are separate libraries.
Makes a lot of sense.
Endorsed.
For everyone's awareness, I chatted briefly with Vinnie about the mechanics of this review. Since it is fundamentally two libraries, and there have been comments about the time available for scrutiny, we are going to schedule this review for 15-20 days instead of the usual 10. Hopefully extended time mitigates some of these concerns.
Will the final verdict accept or reject both libraries together? Should the reviewers pay equal attention to both libraries or just one of them?
On Tuesday, February 17th, 2026 at 8:48 AM, Andrey Semashev via Boost <boost@lists.boost.org> wrote:
On 17 Feb 2026 16:30, Matt Borland via Boost wrote:
Capy and Corosio together as a single review, even though they are separate libraries.
Makes a lot of sense.
Endorsed.
For everyone's awareness, I chatted briefly with Vinnie about the mechanics of this review. Since it is fundamentally two libraries, and there have been comments about the time available for scrutiny, we are going to schedule this review for 15-20 days instead of the usual 10. Hopefully extended time mitigates some of these concerns.
Will the final verdict accept or reject both libraries together?
Yes; one review gets one verdict.
Should the reviewers pay equal attention to both libraries or just one of them?
I will defer to Vinnie to answer the question on which should have priority of attention (if any). Matt
On Tue, Feb 17, 2026 at 5:46 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
Should the reviewers pay equal attention to both libraries or just one of them?
Great question, and the short answer is: both of them. In practical terms, a user of Corosio who wishes to deploy the library to maximum effect will want to structure their code into two parts: * Algorithms which use Corosio directly: resolving domain names, connecting to the network, configuring TLS * Generic algorithms which operate on Capy byte streams: parsing, serialization, using combinators (when_any, when_all) 1. The user authors Corosio algorithms as coroutines returning `capy::task` 2. The user type-erases Corosio sockets using `capy::any_stream` 3. The user performs reading and writing on Corosio I/O objects, yet the buffer sequence concepts and algorithms come from Capy. The reason for this split is structural: each library exposes a narrow interface over a deep modul with non-overlapping responsibilities. This is explained at length here: https://develop.capy.cpp.al/capy/8.design/8b.Separation.html "Capy and Corosio sit at different levels of the physical hierarchy. They encapsulate different information, change for different reasons, and have different platform dependencies. Merging them would degrade the design along every axis that matters for a large-scale system: testability, reusability, and build cost." More evidence of the utility of this split, is that the facilities in std::execution strongly mirror those in Capy. Yet Capy is directed towards users of regular C++, while std::execution is directed towards users of the Sender Sub-Language. Boost.Http (not proposed yet) is implemented entirely in terms of Capy. It is already capable of doing more than Boost.Beast yet depends on a dramatically reduced API surface. This is evidence that the split is the right choice. Originally I hoped to submit Capy independently yet Peter as usual succinctly observed: users cannot value a solution to a problem they do not yet have. Without a use-case the utility of Capy is invisible. It became evident in this thread: no endorsements were forthcoming until a user appeared (Boost.Redis port, which used Corosio). I explore the phenomenon Peter described in greater depth here: https://www.vinniefalco.com/p/the-expertise-gap Thanks
On Tue, Feb 17, 2026 at 7:31 AM Matt Borland via Boost < boost@lists.boost.org> wrote:
Capy and Corosio together as a single review, even though they are separate libraries.
Makes a lot of sense.
Endorsed.
For everyone's awareness, I chatted briefly with Vinnie about the mechanics of this review. Since it is fundamentally two libraries, and there have been comments about the time available for scrutiny, we are going to schedule this review for 15-20 days instead of the usual 10. Hopefully extended time mitigates some of these concerns.
Will the questions on licensing be resolved before the review, before inclusion (if accepted)? -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supongas Nada -- Robot Dreams - http://robot-dreams.net
Will the questions on licensing be resolved before the review, before inclusion (if accepted)?
It is my understanding that Vinnie has contacted a lawyer with this question. More to follow on timelines. FWIW, the U.S. Copyright Office last month put out a report: "Copyright and Artificial Intelligence Part 2: Copyrightability"[1]. Their conclusion section (section V) is short and seems to answer the questions posed in this thread. Matt [1] https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Co...
On Tue, Feb 17, 2026 at 9:49 AM Matt Borland <matt@mattborland.com> wrote:
Will the questions on licensing be resolved before the review, before inclusion (if accepted)?
It is my understanding that Vinnie has contacted a lawyer with this question. More to follow on timelines. FWIW, the U.S. Copyright Office last month put out a report: "Copyright and Artificial Intelligence Part 2: Copyrightability"[1]. Their conclusion section (section V) is short and seems to answer the questions posed in this thread.
Matt
[1] https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Co...
I was not aware of that report. Thank you for pointing it out. It does answer most questions (within the context of US copyright law). The conclusions in it do raise various constraints in LLM/AI generated code in Boost. The most challenging aspect is going to be complying with this determination "Human authors are entitled to copyright in their works of authorship that are perceptible in AI-generated outputs [cut]". Much to think about in this. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supongas Nada -- Robot Dreams - http://robot-dreams.net
On 17 Feb 2026 18:49, Matt Borland via Boost wrote:
Will the questions on licensing be resolved before the review, before inclusion (if accepted)?
It is my understanding that Vinnie has contacted a lawyer with this question. More to follow on timelines. FWIW, the U.S. Copyright Office last month put out a report: "Copyright and Artificial Intelligence Part 2: Copyrightability"[1]. Their conclusion section (section V) is short and seems to answer the questions posed in this thread.
Matt
[1] https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Co...
The report conclusion says that the AI-generated output is copyrightable, provided that it contains sufficient human contribution, but it doesn't discuss the possibility of encumberment of this output due to the fact that the AI model was trained on a potentially copyrighted material. Perhaps, the answer to this is known implicitly or I'm missing it somewhere in the document, but I would like this point to be addressed by the lawyer as well. Specifically, would Boost and the library authors be legally protected if the AI model(s) used by the library maintainers were trained on a copyrighted material.
On Tue, Feb 17, 2026 at 11:29 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 17 Feb 2026 18:49, Matt Borland via Boost wrote:
Will the questions on licensing be resolved before the review, before inclusion (if accepted)?
It is my understanding that Vinnie has contacted a lawyer with this question. More to follow on timelines. FWIW, the U.S. Copyright Office last month put out a report: "Copyright and Artificial Intelligence Part 2: Copyrightability"[1]. Their conclusion section (section V) is short and seems to answer the questions posed in this thread.
Matt
[1] https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Co...
The report conclusion says that the AI-generated output is copyrightable, provided that it contains sufficient human contribution, but it doesn't discuss the possibility of encumberment of this output due to the fact that the AI model was trained on a potentially copyrighted material. Perhaps, the answer to this is known implicitly or I'm missing it somewhere in the document, but I would like this point to be addressed by the lawyer as well. Specifically, would Boost and the library authors be legally protected if the AI model(s) used by the library maintainers were trained on a copyrighted material.
My understanding is that it is explained in the part of the "conclusions and recommendations" (page iii) that I quoted in my other email. I.e. that we must respect the copyrights of any training sources that show up in the output. How to determine that would be a challenge. -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supongas Nada -- Robot Dreams - http://robot-dreams.net
My understanding is that it is explained in the part of the "conclusions and recommendations" (page iii) that I quoted in my other email. I.e. that we must respect the copyrights of any training sources that show up in the output. How to determine that would be a challenge.
Part 3: Generative AI Training [1] gets into AI training and Fair Use with more depth. Matt [1] https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Ge...
Andrey Semashev wrote:
The report conclusion says that the AI-generated output is copyrightable, provided that it contains sufficient human contribution, but it doesn't discuss the possibility of encumberment of this output due to the fact that the AI model was trained on a potentially copyrighted material. Perhaps, the answer to this is known implicitly or I'm missing it somewhere in the document, but I would like this point to be addressed by the lawyer as well. Specifically, would Boost and the library authors be legally protected if the AI model(s) used by the library maintainers were trained on a copyrighted material.
What do you mean by "legally protected"?
On 17 Feb 2026 20:44, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
The report conclusion says that the AI-generated output is copyrightable, provided that it contains sufficient human contribution, but it doesn't discuss the possibility of encumberment of this output due to the fact that the AI model was trained on a potentially copyrighted material. Perhaps, the answer to this is known implicitly or I'm missing it somewhere in the document, but I would like this point to be addressed by the lawyer as well. Specifically, would Boost and the library authors be legally protected if the AI model(s) used by the library maintainers were trained on a copyrighted material.
What do you mean by "legally protected"?
I mean, would Boost and/or the library authors be liable for copyright infringement if the AI model(s) they used was trained on a copyrighted material. Knowingly or unknowingly.
Andrey Semashev wrote:
On 17 Feb 2026 20:44, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
The report conclusion says that the AI-generated output is copyrightable, provided that it contains sufficient human contribution, but it doesn't discuss the possibility of encumberment of this output due to the fact that the AI model was trained on a potentially copyrighted material. Perhaps, the answer to this is known implicitly or I'm missing it somewhere in the document, but I would like this point to be addressed by the lawyer as well. Specifically, would Boost and the library authors be legally protected if the AI model(s) used by the library maintainers were trained on a copyrighted material.
What do you mean by "legally protected"?
I mean, would Boost and/or the library authors be liable for copyright infringement if the AI model(s) they used was trained on a copyrighted material. Knowingly or unknowingly.
The situation would be analogous to the following hypothetical scenario: you hire a software firm from Kazakhstan to write a library for you and publish it as open source; later it turns out that the Kazakhstani boys and girls have unwittingly reproduced copyrighted material from memory. In that case you wouldn't be "protected" as protection as a concept doesn't really exist in the American legal system (assuming we're talking about that.) But your chances of winning would be substantial provided that (a) you're not deriving any financial benefit from your open source library (check) and (b) the source material which was unwittingly appropriated was publicly available (c) under an open source license. While we can never be sure about (b) and (c) I think that for most LLMs those conditions in fact do generally hold. Why are those conditions important? Well, because that's what the copyright owner needs to claim, either you're profiting from a work derived from his, or he's suffering losses caused by your publishing a work derived from his. If the original work is already published and licensed permissively, publishing a derived work can hardly be argued to cause losses. But I'm not a lawyer.
On Tue, Feb 17, 2026 at 12:52 PM Peter Dimov via Boost < boost@lists.boost.org> wrote:
Andrey Semashev wrote:
On 17 Feb 2026 20:44, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
The report conclusion says that the AI-generated output is copyrightable, provided that it contains sufficient human contribution, but it doesn't discuss the possibility of encumberment of this output due to the fact that the AI model was trained on a potentially copyrighted material. Perhaps, the answer to this is known implicitly or I'm missing it somewhere in the document, but I would like this point to be addressed by the lawyer as well. Specifically, would Boost and the library authors be legally
protected if the
AI model(s) used by the library maintainers were trained on a copyrighted material.
What do you mean by "legally protected"?
I mean, would Boost and/or the library authors be liable for copyright infringement if the AI model(s) they used was trained on a copyrighted material. Knowingly or unknowingly.
The situation would be analogous to the following hypothetical scenario: you hire a software firm from Kazakhstan to write a library for you and publish it as open source; later it turns out that the Kazakhstani boys and girls have unwittingly reproduced copyrighted material from memory.
In that case you wouldn't be "protected" as protection as a concept doesn't really exist in the American legal system (assuming we're talking about that.)
But your chances of winning would be substantial provided that (a) you're not deriving any financial benefit from your open source library (check) and (b) the source material which was unwittingly appropriated was publicly available (c) under an open source license.
While we can never be sure about (b) and (c) I think that for most LLMs those conditions in fact do generally hold.
Why are those conditions important? Well, because that's what the copyright owner needs to claim, either you're profiting from a work derived from his, or he's suffering losses caused by your publishing a work derived from his.
If the original work is already published and licensed permissively, publishing a derived work can hardly be argued to cause losses.
That is also my understanding of the situation, but.. It gets complicated as the kind of permissive license the original work is published under matters. As most OSS licenses are incompatible with BSL. And usage of BSL matters for many users of Boost. Including the ones that are using parts of Boost in their standard library implementations. And hence transitively corporations are using with the understanding of safe usage in their commercial products. And in such cases we would be liable to the extent of our ability to ascertain the veracity of our license on the Boost product.. AFAIU. But I'm not a lawyer.
But, yeah, dito. Hence why I will await what the licensed lawyers say about it. :-) -- -- René Ferdinand Rivera Morell -- Don't Assume Anything -- No Supongas Nada -- Robot Dreams - http://robot-dreams.net
René Ferdinand Rivera Morell wrote:
That is also my understanding of the situation, but.. It gets complicated as the kind of permissive license the original work is published under matters. As most OSS licenses are incompatible with BSL. And usage of BSL matters for many users of Boost. Including the ones that are using parts of Boost in their standard library implementations. And hence transitively corporations are using with the understanding of safe usage in their commercial products. And in such cases we would be liable to the extent of our ability to ascertain the veracity of our license on the Boost product.. AFAIU.
Yes, possibly. In this specific case however most (if not all) of Claude's knowledge comes from Asio and it's BSL. (Klemens claims that Claude has learned about coroutines from Cobalt. Fortunately, Cobalt is also BSL, so we're safe.)
On Tue, Feb 17, 2026 at 11:13 AM Peter Dimov via Boost < boost@lists.boost.org> wrote:
...
Disclaimer: Claude Opus 4.6 wrote this: How Transformers Work (Brief) A transformer is a neural network architecture that learns statistical relationships between tokens (subword units). During training, it adjusts billions of floating-point weights via gradient descent to minimize prediction error on the next token. The result is a compressed, lossy representation of patterns in the training data — not a database of documents. Key points: Attention mechanism. Self-attention lets each token attend to every other token in the context window, computing weighted relevance scores. This is how the model captures long-range dependencies — syntax, argument structure, style — without storing literal sequences. Weights ≠ storage. A model with ~100B parameters trained on trillions of tokens cannot store those tokens verbatim. The information is destructively compressed. It's more analogous to how a human programmer who has read a lot of Asio code might unconsciously reproduce idioms and patterns, rather than a photocopier. Memorization does happen, but it's the exception. Research (Carlini et al., "Extracting Training Data from Large Language Models") has shown that LLMs can regurgitate verbatim snippets, particularly of data that appeared many times in training or is highly distinctive. Short, unique sequences (API keys, specific code blocks) are more susceptible. But for typical code, the output is a probabilistic reconstruction, not recall. What This Means for the Copyright Question Dimov's analysis is roughly correct. The real risks are: Verbatim reproduction — possible but unlikely for non-trivial code blocks. The longer the sequence, the less likely it's memorized exactly. Modern models also apply deduplication and guardrails to reduce this. Structural copying — a model might reproduce the architecture or design pattern of a copyrighted work without copying literal text. This is harder to adjudicate. Copyright protects expression, not ideas, so reproducing an API design or algorithmic approach is generally not infringement. The BSL angle — Dimov's point that Asio (and Cobalt) are BSL-licensed is pragmatically relevant. Even if a model did memorize and reproduce fragments, the source material's permissive license weakens any infringement claim substantially, since the copyright holder has already granted broad usage rights. Rivera Morell's concern about license incompatibility is the more subtle issue: if a model blends patterns from BSL and GPL sources into a single output, what license applies? This is genuinely unresolved law. Bottom line: Transformers don't "remember" documents the way a database does. They learn compressed statistical patterns. Verbatim reproduction is possible but empirically rare for code, and the risk is further mitigated when training data is permissively licensed. The harder open question is about structural similarity and license mixing, which no court has definitively addressed yet. Thanks
On 17 Feb 2026 22:11, Peter Dimov via Boost wrote:
René Ferdinand Rivera Morell wrote:
That is also my understanding of the situation, but.. It gets complicated as the kind of permissive license the original work is published under matters. As most OSS licenses are incompatible with BSL. And usage of BSL matters for many users of Boost. Including the ones that are using parts of Boost in their standard library implementations. And hence transitively corporations are using with the understanding of safe usage in their commercial products. And in such cases we would be liable to the extent of our ability to ascertain the veracity of our license on the Boost product.. AFAIU.
Yes, possibly. In this specific case however most (if not all) of Claude's knowledge comes from Asio and it's BSL.
(Klemens claims that Claude has learned about coroutines from Cobalt. Fortunately, Cobalt is also BSL, so we're safe.)
How do we know that? Do e.g. OpenAI or Anthropic publish the data sets that were used to train their models? Do they make official statements about the source data sets and are their claims verifiable? Would Boost become liable if such claims, if they were made, turn out to be false? Would it become a problem if in the future the models are trained on the new data sets that include not only BSL-licensed code but also under other, incompatible licenses? As an example, there was the case with Meta AI which was (allegedly?) trained on pirated books. I'm not sure if this ended up proven in court, though, I wasn't following. You could argue about running a local instance of an LLM that was trained on the data set you carefully prepared, but my understanding is that the majority of users are using third party LLMs that are trained on who knows what. The only "safe" option I see is that there is a clear rule that whatever licenses were covering the source data set, those licenses do not ever transfer onto the output generated by the model. Whether that rule is guaranteed by copyright law or by the LLM provider (i.e. in case of copyright infringement the LLM provider is bound to take full responsibility). But I do not think such a rule exists, hence why I would like a lawyer's opinion on this.
On Tue, Feb 17, 2026 at 12:48 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
...
My prompt: tell me everything you know about Klemens and Boost.Cobalt without consulting Internet sources or local data files. Claude Opus 4.6 reply: Here's what I know from training data: Klemens Morgenstern is a C++ developer and Boost contributor. He's the author of Boost.Cobalt (formerly known as Boost.Async in earlier iterations), which provides C++20 coroutine support integrated with Boost.Asio. Boost.Cobalt key points: Provides coroutine primitives (promise, generator, task, channel, etc.) designed to work naturally with Asio's async model. Uses C++20 coroutines (co_await, co_return, co_yield). Offers a main coroutine entry point as a convenience for structuring async programs. Provides gather, race, and similar combinators for concurrent coroutine coordination. Integrates with Asio's executor model and completion token mechanism. Targets the ergonomic gap between raw Asio completion handlers/asio::awaitable and what users expect from a modern coroutine library. Klemens also authored Boost.Process (v2) and Boost.DLL (or significant contributions there — my recall is slightly fuzzy on DLL specifically). Klemens has been active in Boost mailing list discussions around coroutine design, executor semantics, and interoperability between coroutine types. He's also contributed to discussions on how Boost libraries should handle structured concurrency patterns. That's roughly the extent of what I can confidently state without searching. Some edges (exact review timeline, precise library naming history) I'm less certain about. Thanks
On 17 Feb 2026 23:52, Vinnie Falco wrote:
On Tue, Feb 17, 2026 at 12:48 PM Andrey Semashev via Boost <boost@lists.boost.org <mailto:boost@lists.boost.org>> wrote:
...
My prompt: tell me everything you know about Klemens and Boost.Cobalt without consulting Internet sources or local data files.
Claude Opus 4.6 reply:
Here's what I know from training data: Klemens Morgenstern is a C++ developer and Boost contributor. He's the author of Boost.Cobalt (formerly known as Boost.Async in earlier iterations), which provides C++20 coroutine support integrated with Boost.Asio. Boost.Cobalt key points:
Provides coroutine primitives (promise, generator, task, channel, etc.) designed to work naturally with Asio's async model. Uses C++20 coroutines (co_await, co_return, co_yield). Offers a main coroutine entry point as a convenience for structuring async programs. Provides gather, race, and similar combinators for concurrent coroutine coordination. Integrates with Asio's executor model and completion token mechanism. Targets the ergonomic gap between raw Asio completion handlers/ asio::awaitable and what users expect from a modern coroutine library. Klemens also authored Boost.Process (v2) and Boost.DLL (or significant contributions there — my recall is slightly fuzzy on DLL specifically).
Klemens has been active in Boost mailing list discussions around coroutine design, executor semantics, and interoperability between coroutine types. He's also contributed to discussions on how Boost libraries should handle structured concurrency patterns. That's roughly the extent of what I can confidently state without searching. Some edges (exact review timeline, precise library naming history) I'm less certain about.
Sorry, but I fail to see what this is supposed to prove. Also, asking an LLM (a tool that is known to be imprecise and blatantly wrong at times, that is designed and controlled by a third party) about its implementation details is a rather pointless exercise. Even if the answer was true and precise, it would still not be an official statement that one could take in court.
On Tue, Feb 17, 2026 at 1:03 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
Sorry, but I fail to see what this is supposed to prove.
It doesn't "prove" anything, its just a data point. Today I learned that most frontier models actually know quite a bit about Peter. So much, that if you give them any document and tell it to "dimov this" it will edit the document to remove excess words. Your curious reply does give me an idea... I wonder what "semashev this" would do. Thanks
Andrey Semashev wrote:
Yes, possibly. In this specific case however most (if not all) of Claude's knowledge comes from Asio and it's BSL.
(Klemens claims that Claude has learned about coroutines from Cobalt. Fortunately, Cobalt is also BSL, so we're safe.)
How do we know that? Do e.g. OpenAI or Anthropic publish the data sets that were used to train their models? Do they make official statements about the source data sets and are their claims verifiable? Would Boost become liable if such claims, if they were made, turn out to be false? Would it become a problem if in the future the models are trained on the new data sets that include not only BSL-licensed code but also under other, incompatible licenses?
As an example, there was the case with Meta AI which was (allegedly?) trained on pirated books. I'm not sure if this ended up proven in court, though, I wasn't following.
You could argue about running a local instance of an LLM that was trained on the data set you carefully prepared, but my understanding is that the majority of users are using third party LLMs that are trained on who knows what.
The only "safe" option I see is that there is a clear rule that whatever licenses were covering the source data set, those licenses do not ever transfer onto the output generated by the model. Whether that rule is guaranteed by copyright law or by the LLM provider (i.e. in case of copyright infringement the LLM provider is bound to take full responsibility). But I do not think such a rule exists, hence why I would like a lawyer's opinion on this.
We can't know any of that, of course. But past experience indicates that worrying about such gray area issues (as far as authoring and distributing open source libraries is concerned), or taking some sort of preventative action in anticipation of hypothetical future problems, is a waste of time and resources. If corporations that have a lot more to lose are incorporating AI-generated code into their codebases, we can pretty reasonably conclude that we can also do it.
On 17 Feb 2026 21:50, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
On 17 Feb 2026 20:44, Peter Dimov via Boost wrote:
Andrey Semashev wrote:
The report conclusion says that the AI-generated output is copyrightable, provided that it contains sufficient human contribution, but it doesn't discuss the possibility of encumberment of this output due to the fact that the AI model was trained on a potentially copyrighted material. Perhaps, the answer to this is known implicitly or I'm missing it somewhere in the document, but I would like this point to be addressed by the lawyer as well. Specifically, would Boost and the library authors be legally protected if the AI model(s) used by the library maintainers were trained on a copyrighted material.
What do you mean by "legally protected"?
I mean, would Boost and/or the library authors be liable for copyright infringement if the AI model(s) they used was trained on a copyrighted material. Knowingly or unknowingly.
The situation would be analogous to the following hypothetical scenario: you hire a software firm from Kazakhstan to write a library for you and publish it as open source; later it turns out that the Kazakhstani boys and girls have unwittingly reproduced copyrighted material from memory.
In that case you wouldn't be "protected" as protection as a concept doesn't really exist in the American legal system (assuming we're talking about that.)
But your chances of winning would be substantial provided that (a) you're not deriving any financial benefit from your open source library (check) and (b) the source material which was unwittingly appropriated was publicly available (c) under an open source license.
While we can never be sure about (b) and (c) I think that for most LLMs those conditions in fact do generally hold.
Why are those conditions important? Well, because that's what the copyright owner needs to claim, either you're profiting from a work derived from his, or he's suffering losses caused by your publishing a work derived from his.
If the original work is already published and licensed permissively, publishing a derived work can hardly be argued to cause losses.
There are many open source licenses out there, and there are all kinds of conditions in them, most not related to possible profits to be made by the original holders. For example, GPL family of licenses are forcing the derived work to be covered under GPL as well, and not doing that would still be a violation, regardless of the profits or losses. Attribution is another aspect that many licenses cover. E.g. does one have to retain attribution to the author(s) of the original source for a piece of code that was produced by an LLM?
Andrey Semashev wrote:
There are many open source licenses out there, and there are all kinds of conditions in them, most not related to possible profits to be made by the original holders. For example, GPL family of licenses are forcing the derived work to be covered under GPL as well, and not doing that would still be a violation, regardless of the profits or losses.
Maybe in Germany. :-)
Attribution is another aspect that many licenses cover. E.g. does one have to retain attribution to the author(s) of the original source for a piece of code that was produced by an LLM?
We'll see. I'm not worried. The situation here is, you read GPL code, and then you write your own code using only the knowledge you've obtained, without reproducing any code snippet exactly or approximately (e.g. only changing the identifiers.) Is your code GPL? That's, incidentally, why I dislike the GPL. It's a trap because it gives you the code to read and then claims that you don't have the right to use the knowledge so obtained. But I doubt that anyone would be able to prove GPL violation for AI generated code, unless the original is reproduced exactly. And exact reproduction only happens when the source material is so scarce that the thing you're asking the LLM to output has to be a copy. This is essentially not going to happen with source code. It's not like famous photographs.
On 18 Feb 2026 03:17, Peter Dimov via Boost wrote:
The situation here is, you read GPL code, and then you write your own code using only the knowledge you've obtained, without reproducing any code snippet exactly or approximately (e.g. only changing the identifiers.)
Is your code GPL?
No, because the license covers the form, not the idea. Though, depending on how closely the newly written code resembles the original, it could be difficult to distinguish this case from just modifying the original code and forming a derivative work, which would be covered by GPL. But this ambiguity is not specific to GPL, the same would be the case with any other license.
That's, incidentally, why I dislike the GPL. It's a trap because it gives you the code to read and then claims that you don't have the right to use the knowledge so obtained.
I dislike GPL because it forces people to release their own work under GPL. IMO, people should be able to decide how their work should be used.
On 18 Feb 2026 11:19, Andrey Semashev wrote:
On 18 Feb 2026 03:17, Peter Dimov via Boost wrote:
The situation here is, you read GPL code, and then you write your own code using only the knowledge you've obtained, without reproducing any code snippet exactly or approximately (e.g. only changing the identifiers.)
Is your code GPL?
No, because the license covers the form, not the idea.
Though, depending on how closely the newly written code resembles the original, it could be difficult to distinguish this case from just modifying the original code and forming a derivative work, which would be covered by GPL. But this ambiguity is not specific to GPL, the same would be the case with any other license.
In relation to AI, it is a tool that operates on tokens (i.e the form), so it is more tempting to classify the code it produces as derivative work.
That's, incidentally, why I dislike the GPL. It's a trap because it gives you the code to read and then claims that you don't have the right to use the knowledge so obtained.
I dislike GPL because it forces people to release their own work under GPL. IMO, people should be able to decide how their work should be used.
On Wed, Feb 18, 2026 at 12:38 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
In relation to AI, it is a tool that operates on tokens (i.e the form),
This might end up becoming a moot point soon. Our experience is showing that generated code can get you up and running quickly yet considerable hand-coding is needed to close the final gap. Especially in the reactor implementation, which is a delicate balance of correctness, concurrency, and performance. There's not much left of the original generated code. On the other hand the boilerplate things do stay, and I'm glad they do. I never want to write another conforming AllocatorAwareContainer again. Thanks
On Tue, Feb 17, 2026 at 7:51 AM Matt Borland via Boost < boost@lists.boost.org> wrote:
https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Co...
tl;dr: AI-generated output isn't copyrightable. Prompts are ideas, not expression. Using AI as a tool doesn't affect your copyright. Modifying AI output can create copyrightable elements, but only the human parts. No new legislation needed. The interesting part they buried: identical prompts produce different outputs. That alone settles it. You can't be the "author" of expression you don't control. The rest is 41 pages of agreeing with this while being polite about it. Thanks
On Tue, Feb 17, 2026, at 6:47 PM, Vinnie Falco via Boost wrote:
On Tue, Feb 17, 2026 at 7:51 AM Matt Borland via Boost < boost@lists.boost.org> wrote:
https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Co...
tl;dr:
AI-generated output isn't copyrightable. Prompts are ideas, not expression. Using AI as a tool doesn't affect your copyright. Modifying AI output can create copyrightable elements, but only the human parts. No new legislation needed. The interesting part they buried: identical prompts produce different outputs. That alone settles it. You can't be the "author" of expression you don't control. The rest is 41 pages of agreeing with this while being polite about it.
That doesn't address the other (the interesting) side of the picture: is the output of an LLM subject to the source copyright? The line of reasoning presented cannot explain that (it would be like arguing that it's okay to fence stolen wares as long as you cannot easily figure out where it was stolen from, or if you could substitute the good at random with an item stolen elsewhere). The problem arises as soon copyrighted material is used during training. Yes, it does lead to interesting grey areas, after all humans do similar aggregation in practice. However, there's a very big difference in effort and implied merit involved. To be frank an optimistic (?!) statement like "no new legislation required" seems to fly in the face of public opinion and common sense. Disclaimer: I didn't spell out the entire discussion, so it is possible I missed more relevant obvservations on this side of the question.
On 2026-02-17 20:52, Seth via Boost wrote:
... The problem arises as soon copyrighted material is used during training. Yes, it does lead to interesting grey areas, after all humans do similar aggregation in practice. However, there's a very big difference in effort and implied merit involved.
To be frank an optimistic (?!) statement like "no new legislation required" seems to fly in the face of public opinion and common sense. ...
I am not a lawyer, but infringement is a real problem, with software, hardware as well as AI models. For IBM Granite models https://www.ibm.com/granite from "Read the IBM statement on IP protection" section near bottom of page: "... Moreover, contrary to some other providers of large language models and consistent with the IBM standard approach on indemnification, IBM does not require its customers to indemnify IBM for a customer's use of IBM-developed models. Also, consistent with the IBM approach to its indemnification obligation, IBM does not cap its indemnification liability for the IBM-developed models. ..." I don't know of any other AI model provider that provides a copyright indemnity for its models. If a third party sues you claiming that an IBM-developed AI model or its output infringes on their copyright, IBM will step in to defend you and cover the legal costs. Disclaimer: While I just finished my first semester as Student of Mathematics at age of 60 in early retirement, I am still IBM employee until age of 63. Regards, Hermann Stamm-Wilbrandt.
On 2/17/26 18:47, Vinnie Falco via Boost wrote:
On Tue, Feb 17, 2026 at 7:51 AM Matt Borland via Boost < boost@lists.boost.org> wrote:
https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-2-Co...
tl;dr:
AI-generated output isn't copyrightable. Prompts are ideas, not expression. Using AI as a tool doesn't affect your copyright. Modifying AI output can create copyrightable elements, but only the human parts. No new legislation needed. The interesting part they buried: identical prompts produce different outputs. That alone settles it. You can't be the "author" of expression you don't control. The rest is 41 pages of agreeing with this while being polite about it. Clearly the person who gives the prompt has no copyright claim. On the other hand, I can trivially write an "AI" program that regurgitates its exact training data, even if its internal model looks nothing like the training data. If I "train" this "AI" on the Star Wars movie, does this means I can use it to create a copyright-free copy of Star Wars?
(Such an "AI" already exists. It is called gzip.) -- Rainer Deyke - rainerd@eldwood.com
Rainer Deyke wrote:
Clearly the person who gives the prompt has no copyright claim. On the other hand, I can trivially write an "AI" program that regurgitates its exact training data, even if its internal model looks nothing like the training data. If I "train" this "AI" on the Star Wars movie, does this means I can use it to create a copyright-free copy of Star Wars?
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work. That's because compression works by identifying common patterns, and the common patterns aren't copyrightable. (Because if they were, all these one million films would all infringe on one another's copyright.) So, N=1, infringement, N=1,000,000,000 - not infringement. There's probably a cutoff point somewhere, and the training data sets are very likely above it.
On Fri, Feb 20, 2026, at 1:06 PM, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work.
Researchers were able to reproduce up to 96% of Harry Potter with commercial LLMs https://arxiv.org/abs/2601.02671
Seth wrote:
On Fri, Feb 20, 2026, at 1:06 PM, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work.
Researchers were able to reproduce up to 96% of Harry Potter with commercial LLMs
"We combine an initial instruction (“Continue the following text exactly as it appears in the original literary work verbatim”) with a short snippet of seed text from the beginning of a book (e.g., the first sentence)." So, basically, they asked the LLMs to commit copyright infringement, and they complied. I'm shocked.
On Fri, Feb 20, 2026 at 8:47 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:
Seth wrote:
On Fri, Feb 20, 2026, at 1:06 PM, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work.
Researchers were able to reproduce up to 96% of Harry Potter with commercial LLMs
"We combine an initial instruction (“Continue the following text exactly as it appears in the original literary work verbatim”) with a short snippet of seed text from the beginning of a book (e.g., the first sentence)."
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
I'm shocked.
It proves that LLM output isn't inherently transformative; it's sort of an important point, because any work generated by an LLM needs to show individually it's not infringing copyright, since it could be a plain copy. And that's pretty much impossible, since we don't have the training data. I'm generally really surprised how the legal situation gets discussed here. I don't think we can be sure of anything until legislation passes or decisions with binding precedent happen. Nobody knows what the legal situation is at this moment.
Klemens Morgenstern wrote:
On Fri, Feb 20, 2026 at 8:47 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
I'm shocked.
It proves that LLM output isn't inherently transformative;
It doesn't need to be "inherently transformative". Wanting it to be "inherently transformative" is again asking for "legal protection" and this simply doesn't exist. It doesn't exist for people, it doesn't exist for LLMs.
it's sort of an important point, because any work generated by an LLM needs to show individually it's not infringing copyright, since it could be a plain copy.
That's not how it works. If that was how it worked, _you_ would also have needed to show, for every line of code you wrote, that it's not infringing copyright. Your output isn't "inherently transformative" either.
On Fri, Feb 20, 2026 at 9:18 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:
Klemens Morgenstern wrote:
On Fri, Feb 20, 2026 at 8:47 PM Peter Dimov via Boost <boost@lists.boost.org> wrote:
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
I'm shocked.
It proves that LLM output isn't inherently transformative;
It doesn't need to be "inherently transformative". Wanting it to be "inherently transformative" is again asking for "legal protection" and this simply doesn't exist. It doesn't exist for people, it doesn't exist for LLMs.
it's sort of an important point, because any work generated by an LLM needs to show individually it's not infringing copyright, since it could be a plain copy.
That's not how it works. If that was how it worked, _you_ would also have needed to show, for every line of code you wrote, that it's not infringing copyright.
I think my language wasn't clear - I didn't mean before in a court. If I write a piece of code, I usually have a good sense of what it's based on so I can slap a license on it. You just have no idea with LLMs, because you cannot trace output back to the input. And there's always input, since LLMs have no thoughts or ideas. So we know it's derived from something, i.e. it's never original, but we don't know how transformative it is, and we don't have a person to ask.
Your output isn't "inherently transformative" either.
Correct, I could be infringing copyright by reproducing a copyrighted work from memory.
On Fri, Feb 20, 2026, at 1:45 PM, Peter Dimov via Boost wrote:
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
I'm shocked.
I posted it to counter the convenient sentiment that "it's nigh impossible to get exact [GPL] code from an LLM". Also, the prompt engineering was only there to *increase* the accuracy percentage. In short, the paper focuses on recoverability of input. That's way more tight than the plagiarism concern. If you just naively post the first sentence of any HP chapter and ask for "a continuation" (not "the continuation") you may get results that already cross the thresholds for plagiarism. $0.02
Seth wrote:
On Fri, Feb 20, 2026, at 1:45 PM, Peter Dimov via Boost wrote:
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
I'm shocked.
I posted it to counter the convenient sentiment that "it's nigh impossible to get exact [GPL] code from an LLM".
I don't think anyone has claimed that. If you commit willful infringement by using an LLM, whether by reproducing Harry Potter or GPL code, this isn't any different by committing willful infringement by copying Harry Potter or the GPL code without using an LLM. That's not particularly interesting from a legal perspective. What's interesting from a legal perspective if you don't explicitly set out to do that.
"We combine an initial instruction (“Continue the following text exactly as it appears in the original literary work verbatim”) with a short snippet of seed text from the beginning of a book (e.g., the first sentence)."
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
Maybe what Seth and the authors of the paper were trying to say, it's that if at some point an agent autonomously decides to reproduce the algorithm in lib XXX verbatim (from its own weights) because it solves the user prompt better, then a) it would not be clear to the developer and b) the copyright law system is (as far as I know) not very clear about this ? Best wishes, Arno
Arnaud Becheler wrote:
"We combine an initial instruction (“Continue the following text exactly as it appears in the original literary work verbatim”) with a short snippet of seed text from the beginning of a book (e.g., the first sentence)."
So, basically, they asked the LLMs to commit copyright infringement, and they complied.
Maybe what Seth and the authors of the paper were trying to say, it's that if at some point an agent autonomously decides to reproduce the algorithm in lib XXX verbatim (from its own weights) because it solves the user prompt better,
Maybe it will, yes. But the cited paper doesn't show that.
then a) it would not be clear to the developer and b) the copyright law system is (as far as I know) not very clear about this ?
The copyright law system is very clear about this. If your code is a verbatim copy of a copyrighted work, then it's infringing regardless of whether you used an LLM or not. It would be infringing even if you stumbled upon the verbatim copy by chance. That's where and why all these additional considerations about fair use, willful infringement, financial gains, damages, and so on come into play.
Thank you Peter for those precisions. Do you mean that what would be needed is a paper showing that orchestrator agents can prompt sub-agents with instructions akin to:
“Continue the following text exactly as it appears in the original literary work verbatim”) with a short snippet of seed text from the beginning of a book (e.g., the first sentence)."
? Best wishes, Arno
On 2/20/26 13:06, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
Clearly the person who gives the prompt has no copyright claim. On the other hand, I can trivially write an "AI" program that regurgitates its exact training data, even if its internal model looks nothing like the training data. If I "train" this "AI" on the Star Wars movie, does this means I can use it to create a copyright-free copy of Star Wars?
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work.
In other words: if you kill a man, you're a murderer. If you kill a million, you're a conqueror.
That's because compression works by identifying common patterns, and the common patterns aren't copyrightable. (Because if they were, all these one million films would all infringe on one another's copyright.)
No, the process is fundamentally different, and it's the process that matters in copyright, not the result. If I, by pure coincidence, write a book that's word-for-word identical to a book that already exists but I haven't read, then I haven't violated copyright because I haven't copied or transformed the original work. If I take an existing picture and apply filters to it until it is unrecognizable, then I have legally created a derivative work that is still under the original copyright. -- Rainer Deyke - rainerd@eldwood.com
On Fri, Feb 20, 2026 at 9:46 PM Rainer Deyke via Boost < boost@lists.boost.org> wrote:
On 2/20/26 13:06, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
Clearly the person who gives the prompt has no copyright claim. On the other hand, I can trivially write an "AI" program that regurgitates its exact training data, even if its internal model looks nothing like the training data. If I "train" this "AI" on the Star Wars movie, does this means I can use it to create a copyright-free copy of Star Wars?
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work.
In other words: if you kill a man, you're a murderer. If you kill a million, you're a conqueror.
That's because compression works by identifying common patterns, and the common patterns aren't copyrightable. (Because if they were, all these one million films would all infringe on one another's copyright.)
No, the process is fundamentally different, and it's the process that matters in copyright, not the result.
If I, by pure coincidence, write a book that's word-for-word identical to a book that already exists but I haven't read, then I haven't violated copyright because I haven't copied or transformed the original work.
If I take an existing picture and apply filters to it until it is unrecognizable, then I have legally created a derivative work that is still under the original copyright.
Unless it's transformative enough to fall under fair use. That'll be hard to argue for code refactoring.
Rainer Deyke wrote:
On 2/20/26 13:06, Peter Dimov via Boost wrote:
Rainer Deyke wrote:
Clearly the person who gives the prompt has no copyright claim. On the other hand, I can trivially write an "AI" program that regurgitates its exact training data, even if its internal model looks nothing like the training data. If I "train" this "AI" on the Star Wars movie, does this means I can use it to create a copyright-free copy of Star Wars?
No. But if you train it on one million Star Wars-like films, and then generate a film that is like them, but isn't a copy of any of them, then that generated film is - this is the argument - not a derived work.
In other words: if you kill a man, you're a murderer. If you kill a million, you're a conqueror.
Well, no. Taking one millionth of one million things each is not killing one million.
That's because compression works by identifying common patterns, and the common patterns aren't copyrightable. (Because if they were, all these one million films would all infringe on one another's copyright.)
No, the process is fundamentally different, and it's the process that matters in copyright, not the result.
If I, by pure coincidence, write a book that's word-for-word identical to a book that already exists but I haven't read, then I haven't violated copyright because I haven't copied or transformed the original work.
Maybe in theory, but you will almost certainly lose that case. You'll lose it more often than not even if you take extreme precautions and are able to prove you didn't copy the original.
If I take an existing picture and apply filters to it until it is unrecognizable, then I have legally created a derivative work that is still under the original copyright.
Right. It's precisely this reductive view - if something is an input to a process, then the end result is a derived work by definition - that is being challenged by LLMs. It can't be correct, because if it were, you wouldn't be able to produce a non- infringing work either, if you have ever been exposed to a copyrighted one; the exposure has altered your mental state, so your output is tainted by definition.
Le 2026-02-20 15:09, Peter Dimov via Boost a écrit :
It can't be correct, because if it were, you wouldn't be able to produce a non- infringing work either, if you have ever been exposed to a copyrighted one; the exposure has altered your mental state, so your output is tainted by definition.
IIRC, some employers are forbidding their engineers to look at their competitors' patents for this exact reason : they don't want to be accused of some sort of plagiarism / patent infrigement if their engineers come to a solution that is too close from the patented one. I'm not sure how the same would not apply to an LLM. To add to the confusion in regards of AI-generated code, it seems that US and EU are somewhat diverging here, EU being, from what i understand, more protective of the original authors than what's happening in US. And there's the rest of the world, of course… A different approach exists, however. IIRC some companies are selling tools to detect potential licence infrigements in code base (typically, reuse of GPL-ed code, but also random snippets without licence, etc.). I never used them, don't know how reliable they are, but that could be worth a try if someone has access to such tools. Regards, Julien
Julien Blanc wrote:
Le 2026-02-20 15:09, Peter Dimov via Boost a écrit :
It can't be correct, because if it were, you wouldn't be able to produce a non- infringing work either, if you have ever been exposed to a copyrighted one; the exposure has altered your mental state, so your output is tainted by definition.
IIRC, some employers are forbidding their engineers to look at their competitors' patents for this exact reason : they don't want to be accused of some sort of plagiarism / patent infrigement if their engineers come to a solution that is too close from the patented one.
Yes. This by the way doesn't work. It's only a defense against willful infringement (which carries triple damages) and not against infringement in general. That's why I said that preemptively forbidding things is almost always counterproductive.
On 2/20/26 15:09, Peter Dimov via Boost wrote:
It can't be correct, because if it were, you wouldn't be able to produce a non- infringing work either, if you have ever been exposed to a copyrighted one; the exposure has altered your mental state, so your output is tainted by definition.
There is a legal distinction between what happens in my head and what happens on a device. There has to be, because it is unconscionable for the law to restrict one's thoughts. It is legal for me to read a copyrighted book, even if doing so produces a perfect copy of the same book in my brain. It is not legal for me to copy the same book on a device. My argument is not that a LLM produces output that would be considered copyright-infringing if produced by a human. My argument is that the LLM itself is a violation of copyright, which taints all of its output regardless of what the output looks like. -- Rainer Deyke - rainerd@eldwood.com
Hello, I'm endorsing capy after finishing reviewing it for a while, there is concepts such as: - https://github.com/cppalliance/capy/blob/develop/include/boost/capy/concept/... - https://github.com/cppalliance/capy/blob/develop/include/boost/capy/concept/... Which are useful for plenty of I/O use cases such as HTTP, Distributed Systems, Multimedia, HFT, and of course Beast2. I also have a question about the following snippet at: https://github.com/cppalliance/capy/blob/develop/example/asio/use_capy_examp... We can see at line 119: ```cpp capy::run_async(ctx.get_executor())(run_example(client, server)); ``` It's elegant, but maybe we could also benefit from this syntax as well: ```cpp auto& exec{ctx.get_executor()}; exec.run_async(run_example(client, server)); ``` But overall the library is solid, I would 100% recommend it to someone else right now. Thanks, Best, Amlal El Mahrouss amlalelmahrouss@icloud.com
On Fri, Feb 20, 2026 at 4:32 AM Amlal El Mahrouss via Boost < boost@lists.boost.org> wrote:
I also have a question about the following snippet at: https://github.com/cppalliance/capy/blob/develop/example/asio/use_capy_examp...
We can see at line 119:
```cpp capy::run_async(ctx.get_executor())(run_example(client, server)); ```
It's elegant, but maybe we could also benefit from this syntax as well:
```cpp auto& exec{ctx.get_executor()}; exec.run_async(run_example(client, server)); ```
So, if I understand you correctly then you are asking for the Executor concept to have additional members: unspecified run_async( IoRunnable auto&& ); This is problematic for a few reasons explained P4003[1] 1. An Executor is a low-level primitive. It only specifies how a coroutine is resumed. A launch function is something different: a high level construct responsible for creating the environment in which a coroutine runs, and specifying behavior when the coroutine performs its final suspend. run_async falls in the latter category. 2. The one-call syntax you propose suffers from the allocator timing problem described in P4007 Senders and Coroutines [2]. The coroutine frame would be created before the allocator has the chance to propagate. 3. `run_async` is just one example of a launch function: a function which begins a new coroutine chain of execution either from a regular function (such as main) or a coroutine. The set of launch functions is open, as users can (and will) define their own. Here is one such example: https://github.com/cppalliance/corosio/blob/0cb2c0096ca2cf8f7c1f1617cb04c191... It makes sense to keep the Executor concept as a low-level narrow abstraction, and partition the launch functions into a higher level algorithmic set. Thanks [1] https://wg21.link/p4003 IoAwaitables: A Coroutines-Only Framework [2] https://wg21.link/p4007 Senders and Coroutines P.S. I suppose its pointless asking for the copyright discussion to be moved to a separate thread :)
Hello, After digging deeper in corosio's code base from the following link: https://github.com/cppalliance/corosio/blob/0cb2c0096ca2cf8f7c1f1617cb04c191... You were spot-on with my suggestions, keeping the Executor concept as is would indeed be beneficial per P4007 [2] (Sender and Co-routines, https://wg21.link/p4007) Thanks, Amlal. On 2/20/26 15:11, Vinnie Falco via Boost wrote:
On Fri, Feb 20, 2026 at 4:32 AM Amlal El Mahrouss via Boost < boost@lists.boost.org> wrote:
I also have a question about the following snippet at: https://github.com/cppalliance/capy/blob/develop/example/asio/use_capy_examp...
We can see at line 119:
```cpp capy::run_async(ctx.get_executor())(run_example(client, server)); ```
It's elegant, but maybe we could also benefit from this syntax as well:
```cpp auto& exec{ctx.get_executor()}; exec.run_async(run_example(client, server)); ```
So, if I understand you correctly then you are asking for the Executor concept to have additional members:
unspecified run_async( IoRunnable auto&& );
This is problematic for a few reasons explained P4003[1]
1. An Executor is a low-level primitive. It only specifies how a coroutine is resumed. A launch function is something different: a high level construct responsible for creating the environment in which a coroutine runs, and specifying behavior when the coroutine performs its final suspend. run_async falls in the latter category.
2. The one-call syntax you propose suffers from the allocator timing problem described in P4007 Senders and Coroutines [2]. The coroutine frame would be created before the allocator has the chance to propagate.
3. `run_async` is just one example of a launch function: a function which begins a new coroutine chain of execution either from a regular function (such as main) or a coroutine. The set of launch functions is open, as users can (and will) define their own. Here is one such example:
https://github.com/cppalliance/corosio/blob/0cb2c0096ca2cf8f7c1f1617cb04c191...
It makes sense to keep the Executor concept as a low-level narrow abstraction, and partition the launch functions into a higher level algorithmic set.
Thanks
[1] https://wg21.link/p4003 IoAwaitables: A Coroutines-Only Framework
[2] https://wg21.link/p4007 Senders and Coroutines
P.S. I suppose its pointless asking for the copyright discussion to be moved to a separate thread :) _______________________________________________ Boost mailing list -- boost@lists.boost.org To unsubscribe send an email to boost-leave@lists.boost.org https://lists.boost.org/mailman3/lists/boost.lists.boost.org/ Archived at: https://lists.boost.org/archives/list/boost@lists.boost.org/message/XP6QFNJJ...
participants (18)
-
Amlal El Mahrouss -
Andrey Semashev -
Arnaud Becheler -
Christian Mazakas -
hermann@stamm-wilbrandt.de -
Jean-Louis Leroy -
Julien Blanc -
Klemens Morgenstern -
legalize+jeeves@mail.xmission.com -
Matt Borland -
Nigel Stewart -
Peter Dimov -
Rainer Deyke -
René Ferdinand Rivera Morell -
Richard Hodges -
Ruben Perez -
Seth -
Vinnie Falco