Boost logo

Boost :

From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2024-07-16 20:48:52


On 16/07/2024 21:29, Christian Mazakas via Boost wrote:
> On Tue, Jul 16, 2024 at 12:35 PM Niall Douglas via Boost
>> But ... I don't agree with hard coding in C++ coroutines personally. I
>> think Sender-Receiver (before WG21 corrupted it) is a better design
>> choice here especially as if within a C++ coroutine you can co_await and
>> it'll "just work" without any extra effort.
>>
>
> This is interesting. Asio was developed when there was no standardized
> concurrency primitive
> in C++. We now have one: c++20 coroutines. To me, the universal completion
> token stuff was a
> lot of try-hard and template bloat for a feature wasn't worth its weight.
> But at the time, we didn't
> know better because no one was doing this kind of stuff.
>
> I think in hindsight, the universal completion token was a mistake. Maybe
> Sender, Receiver abuses
> all that ADL to avoid introducing templates here but I'm hesitant to
> un-hardcode myself from coroutines
> because being realistic, I imagine most C++ users really just wanna
> `co_await some_socket_recv();`.

That's exactly what S&R delivers!

WG21 S&R has very severe template bloat. Some people see compile times
reminiscent of Boost at its worst in the late 2000s. But non-WG21 S&R
can be implemented in a much lighter weight way. I made mine ABI stable,
and that forces most of the template bloat to not exist.

>> I see in your github repo you are benching against ASIO. What kinds of
>> results did you get?
>>
>
> Pretty alright.
>
> I have benchmarks that attempt to measure both latency and throughput and
> in general, I'm
> like 1.75x faster than Asio, almost 2x. This includes builtin timeouts so I
> use Beast's tcp_stream
> for this purpose. I guess this affects the latency-based benchmark more but
> for the throughput one,
> io_uring's batched I/O and handling of it really starts to shine.

You're not using the linked op timeout feature of io_uring?

It's a bit expensive TBH. I've 'cheated' and set a timeout directly on
the socket itself so it errors out after a while. This is nasty, but fast :)

> Anything where you can use multishot recv() effectively means you're going
> to shred Asio or other
> readiness-based models.

That plus the DMA registered buffers support. ASIO could support the
older form which didn't deliver much speedup, but the new form where
io_uring/the NIC allocates the receive buffers for you ... it's Windows
RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single
kernel thread without much effort now, and 100 Gbps NIC if you can keep
the i/o granularity big enough. That was expensive Mellanox userspace
TCP type performance a few years ago.

Niall


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk