Boost logo

Boost :

From: Christian Mazakas (christian.mazakas_at_[hidden])
Date: 2024-07-17 17:17:45


On Tue, Jul 16, 2024 at 1:49 PM Niall Douglas via Boost <
boost_at_[hidden]> wrote:

> That's exactly what S&R delivers!
>
> WG21 S&R has very severe template bloat. Some people see compile times
> reminiscent of Boost at its worst in the late 2000s. But non-WG21 S&R
> can be implemented in a much lighter weight way. I made mine ABI stable,
> and that forces most of the template bloat to not exist.
>

Ha ha, this actually makes me even _more_ hesitant to adopt it. For now, I
think
the simple coroutines-only scheme I have now is sufficient. The code can
always
theoretically be altered later to support different async schemes.

> You're not using the linked op timeout feature of io_uring?
>
> It's a bit expensive TBH. I've 'cheated' and set a timeout directly on
> the socket itself so it errors out after a while. This is nasty, but fast
> :)
>

I use it in places. I use it for controlling connect() timeouts with TCP
sockets.

I'm not sure there's any other spots. But yes, it is quite expensive.

For sends and receives, I instead have a multishot timeout operation that's
created
when the `tcp::stream` class is. This timer automatically posts a CQE
periodically
which I then use to check activity on the TCP stream. So if it's in the
middle of an initiated
send() operation, I can check its last activity and if nothing has
happened, I can cancel
the operation.

In benchmarks, I actually didn't really notice a difference when I toggled
this functionality
in or out so it's relatively lightweight for "realistic" cases.

> That plus the DMA registered buffers support. ASIO could support the
> older form which didn't deliver much speedup, but the new form where
> io_uring/the NIC allocates the receive buffers for you ... it's Windows
> RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single
> kernel thread without much effort now, and 100 Gbps NIC if you can keep
> the i/o granularity big enough. That was expensive Mellanox userspace
> TCP type performance a few years ago.
>

I'm not sure I know what you're talking about here, being honest. I know
io_uring
has registered buffers for file I/O and I know that you can also use a
provided buffers
API for multishot recv() and multishot read() (i.e.
`io_uring_register_buffers()` and
`io_uring_buf_ring_setup()`).

This is confusing to me because these two functions don't really allocate.
_You_ allocate
and then register them with the ring. So I'm curious about this NIC
allocating a receive buffer
for me here.

Fwiw, Fiona does actually use multishot TCP recv(), so it does use the
buf_ring stuff. This has
interesting API implications because in the epoll world, users are
accustomed to:

    co_await socket.async_recv(my_buffer);

But in Fiona, you instead have:

    auto m_buf_sequence = co_await socket.async_recv();
    return std::move(m_buf_sequence).value();

Ownership of the buffers is inverted here, which actually turns out to be
quite the API break.

Once I get the code into better shape, I'd like to start shilling it but
who knows if it'll ever catch on.

- Christian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk