Boost users' mailing page: Re: receive handler priority

From: Alexander CarÃ´t (alexander_carot_at_[hidden])
Date: 2022-12-28 13:55:16

Thanks, Mathias, for these great new general insights and io_uring in particular !

>>Assuming processing times are small you're unlikely to get scheduled out to begin with unless the machine is

>>overloaded.

OK, good for now: My use case has to work x-platform (plus see below) so I think classical Asio is fine.

>>If you're looking for low-latency (microsecond scale), you should of course avoid all those context switches (including the

>>interrupts), copies, and do all the thread scheduling yourself rather than rely on the kernel.

I am dealing with low-latency audio networking which is indeed a time critical application but not that time-critical that it would require io_uring I believe: The current lowest packet reception interval lies at 1.3 ms (2.6 ms even more realistic in current home consumer networks) where I can typically accept network and/or scheduling jitter in the range of up to 500 micro seconds.

In fact in practice classical Asio does work out fine so far but I was wondering where I can potentially improve the system and at which point additional realtime threads might fall on my feet.

Thanks again,

best

Alex

--
http://www.carot.de
Email : Alexander@Carot.de
Tel.: +49 (0)177 5719797

Gesendet: Mittwoch, 28. Dezember 2022 um 13:44 Uhr
Von: "Mathias Gaunard" <mathias.gaunard@ens-lyon.org>
An: boost-users@lists.boost.org
Cc: "Alexander Carôt" <alexander_carot@gmx.net>
Betreff: Re: [Boost-users] receive handler priority

On Wed, 28 Dec 2022, 12:01 Alexander Carôt via Boost-users, <boost-users@lists.boost.org> wrote:

Hello all,

I have a classical receive handling structure for an asynchronous UDP socket:

void receiver::receive(){
mySocket->async_receive_from(
boost::asio::buffer(input, maxInputLength), senderEndpoint,
boost::bind(&receiver::handleReceiveFrom, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}

void receiver::handleReceiveFrom(const boost::system::error_code &error,
size_t bytes_recvd) {

// SPECIFIC CODE HERE AND FINAK CALL OF RECEIVE FUNCTION

this->receive();
}

Besides this my app works with several realtime threads I have assigned maximum priority to via:

pthread_t threadID = (pthread_t) sendThread.native_handle();
struct sched_param param;
int policy = SCHED_FIFO;
int max = sched_get_priority_max(policy);
param.sched_priority = max;
pthread_setschedparam(threadID, policy, &param);

Now I wonder in how far these realtime threads can have an impact on the performance of my receiver process: So far I assumed that the receiver handle code is executed immediately when a packet arrives at the NIC but I am not sure if other realtime threads might possibly delay this process when scheduled first.

In other words: If we think of the receiver handling process as a thread that is triggered by incoming packets does this thread also have realtime capabilities that can suffer from competing processes ?

Thanks in advance for clarification,
best

When data is received on the NIC (i.e. the full frame has been received and the FCS is correct), it emits an interrupt which triggers a switch into kernel mode, which signifies that a DMA transfer into the kernel internal buffers has completed and must be acted on. It then processes such buffers to reconstruct a queue of UDP packets, mapping the IP destination to registered file descriptors, then switches to the thread that was waiting for such a packet. The thread then instructs the kernel to copy the data from its internal buffers to the userland buffer, which causes another two context switches.

Your thread is going to sleep whenever you are waiting for data (unless you use some kind of poll or poll_once call), so your scheduling only affects things while you are processing the packet. Assuming processing times are small you're unlikely to get scheduled out to begin with unless the machine is overloaded.

If you're looking for low-latency (microsecond scale), you should of course avoid all those context switches (including the interrupts), copies, and do all the thread scheduling yourself rather than rely on the kernel.

But Asio isn't particularly a good fit for anything that low-level or Linux-specific. While it can use io_uring for example (so long as you're willing to be extremely careful of ODR), it's still quite more limited than what you can achieve by using that API directly, but it will at least avoid the redundant context switches at the end since the kernel will directly write into the buffers.

Using io_uring directly would however allow you to control the affinity and scheduling of the real-time kernel thread processing your data as it arrives from the NIC, and switch from interrupts to a pure busy polling method, indepedently in the kernel side and in userland.

Bypassing the kernel is of course even better potentially.

Boost Users :