![]() |
Boost : |
From: VinÃcius dos Santos Oliveira (vini.ipsmaker_at_[hidden])
Date: 2025-02-11 18:17:25
Em seg., 30 de dez. de 2024 Ã s 11:10, Vinnie Falco via Boost
<boost_at_[hidden]> escreveu:
> On Mon, Dec 30, 2024 at 2:04â¯AM Richard Hodges via Boost <
> boost_at_[hidden]> wrote:
>
> > ...execution order is not part of the contract
> >
>
> Yes, it is:
>
> https://www.boost.org/doc/libs/1_87_0/doc/html/boost_asio/reference/io_context__strand.html#boost_asio.reference.io_context__strand.order_of_handler_invocation
I think I've found a violation to these rules (which I depend on).
However I've been failing to produce a minimal test case to send a
proper bug report. I've exhausted my ideas for the time being, so I've
come here to ask for help/new ideas that I could attempt.
Given I've failed to produce a minimal test case, I'll have to point
you guys to code which is larger.
So here I call strand.post(a):
https://gitlab.com/emilua/emilua/-/blob/v0.11.0/src/actor.ypp#L1038
And here I call strand.post(b):
https://gitlab.com/emilua/emilua/-/blob/v0.11.0/include/emilua/core.hpp#L1201
strand.post(a) happens before strand.post(b) (I even inserted printf()
statements locally just to make sure they really do). Therefore a()
should happen before b(), but that's not what I've been observing. I
observed b() happening before a() on Windows and Linux (both epoll and
io_uring). On FreeBSD a() always happens before b(). I don't know what
ASIO does differently in FreeBSD. Sometimes on Linux I get the desired
behavior as well, but almost always I get the undesired behaviour. I
think when the cache is hot I always get the undesired behavior. So
that's the minimal test case I wrote:
#include <boost/asio.hpp>
#include <iostream>
#include <thread>
#include <memory>
namespace asio = boost::asio;
struct actor
{
actor(asio::io_context& ioc, int nsenders)
: work_guard{ioc.get_executor()}
, s{ioc}
, nsenders{nsenders}
{}
const asio::io_context::strand& strand()
{
return s;
}
asio::executor_work_guard<asio::io_context::executor_type> work_guard;
asio::io_context::strand s;
int nsenders;
};
int main()
{
std::thread t;
std::shared_ptr<actor> a;
{
auto ioc = std::make_shared<asio::io_context>();
a = std::make_shared<actor>(*ioc, 2);
t = std::thread{[ioc]() mutable {
ioc->run();
ioc.reset();
}};
}
std::cout << "1\n";
a->strand().post([a]{
std::cout << "2\n";
if (--a->nsenders == 0) {
a->work_guard.reset();
}
}, std::allocator<void>{});
std::cout << "a\n";
a->strand().post([a]{
std::cout << "b\n";
if (--a->nsenders == 0) {
a->work_guard.reset();
}
}, std::allocator<void>{});
a.reset();
t.join();
}
That's the same algorithm I use in Emilua, but now I cannot observe
the undesired result. I've tried to insert sleep_for() in a few spots
in an attempt to mimic the delays/overhead from LuaJIT, but they were
not enough to reproduce the behavior I observed in Emilua. So... ideas
on how I can make this minimal test case stress more code branches
from ASIO?
If you want to reproduce the problem locally, you can attempt the Lua
code below:
if _CONTEXT ~= 'main' then
local inbox = require 'inbox'
print(inbox:receive())
return
end
local actor2 = spawn_vm{
module = '.',
-- comment/remove inherit_context=false to make the code work
inherit_context = false
}
actor2:send('hello')
Just run the program with:
emilua path/to/program.lua
The desired output would be the message "hello" printed in stdout
(which happens very rarely on Linux, and happens every time on
FreeBSD). The undesired output would be in the likes of:
Main fiber from VM 0x796707f86380 panicked: 'Broadcast the address
before attempting to receive on it'
stack traceback:
[string "?"]: in function 'receive'
/home/vinipsmaker/t5.lua:3: in main chunk
[C]: in function ''
[string "?"]: in function <[string "?"]:0>
-- VinÃcius dos Santos Oliveira https://vinipsmaker.github.io/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk