Boost logo

Boost :

From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2024-07-17 20:31:37


On 17/07/2024 18:17, Christian Mazakas via Boost wrote:

>> That plus the DMA registered buffers support. ASIO could support the
>> older form which didn't deliver much speedup, but the new form where
>> io_uring/the NIC allocates the receive buffers for you ... it's Windows
>> RIO levels of fast. I certainly can saturate a 40 Gbps NIC from a single
>> kernel thread without much effort now, and 100 Gbps NIC if you can keep
>> the i/o granularity big enough. That was expensive Mellanox userspace
>> TCP type performance a few years ago.
>>
>
> I'm not sure I know what you're talking about here, being honest. I know
> io_uring
> has registered buffers for file I/O and I know that you can also use a
> provided buffers
> API for multishot recv() and multishot read() (i.e.
> `io_uring_register_buffers()` and
> `io_uring_buf_ring_setup()`).
>
> This is confusing to me because these two functions don't really allocate.
> _You_ allocate
> and then register them with the ring. So I'm curious about this NIC
> allocating a receive buffer
> for me here.
>
> Fwiw, Fiona does actually use multishot TCP recv(), so it does use the
> buf_ring stuff. This has
> interesting API implications because in the epoll world, users are
> accustomed to:
>
> co_await socket.async_recv(my_buffer);
>
> But in Fiona, you instead have:
>
> auto m_buf_sequence = co_await socket.async_recv();
> return std::move(m_buf_sequence).value();
>
> Ownership of the buffers is inverted here, which actually turns out to be
> quite the API break.
>
> Once I get the code into better shape, I'd like to start shilling it but
> who knows if it'll ever catch on.

Yes, you're already using the thing I was referring to, which is the
"ring provided buffers" feature via the API io_uring_register_buf_ring.

You're right that its docs presents the feature as userspace allocating
pages from the kernel, then giving ownership of those pages to io_uring,
which then fills them with received data as it chooses and hands
ownership back to userspace. That's how it appears from userspace anyway.

If I were the kernel, I'd free the backing store for the pages handed to
me, and repoint the virtual memory address at pages coming off the NIC's
DMA. Depends on the NIC, some can address all of memory, some a subset,
some barely at all. High end NICs would be very efficient, occasional
memory copying might be needed for prosumer NICs, and for cheap and
nasty NICs incapable of more than a 64Kb window ... well, kinda have to
copy memory there.

Anyway point is having the kernel tell you the buffers filled instead of
you telling it what buffers to fill is the right design. This is why
LLFIO's read op allowed reads to fill in buffers read completely
differently to buffers supplied, incidentally.

Niall


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk