Subject: [Boost-bugs] [Boost C++ Libraries] #12474: Using two different resolver instances on the same io_service causes a race condition
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2016-09-21 09:47:04
#12474: Using two different resolver instances on the same io_service causes a race
condition
---------------------------------------+----------------------------
Reporter: michele.de.stefano@⦠| Owner: chris_kohlhoff
Type: Bugs | Status: new
Milestone: To Be Determined | Component: asio
Version: Boost 1.61.0 | Severity: Problem
Keywords: race condition, data race |
---------------------------------------+----------------------------
Dear developer (or developers) of Boost Asio,[[BR]]
[[BR]]
I think I've found a bug in Asio. If I instantiate two different TCP
resolvers on the same io_service and I use these two distinct resolvers
for resolving the same endpoint, a race condition is generated.[[BR]]
[[BR]]
I have attached two sources (C++11 is required) that reproduce the issue.
Unfortunately you need to run the programs many times to experience a
failure ... I have bash scripts that allow me to run these programs
thousands of times, each time using a different, random, free port on the
local host.[[BR]]
[[BR]]
Quick instructions for running the tests:[[BR]]
[[BR]]
`test_engine_client_dbg_x -s -p <port>`[[BR]]
[[BR]]
runs the server.[[BR]]
[[BR]]
`test_engine_client_dbg_x -c -p <port>`[[BR]]
[[BR]]
Runs the client.[[BR]]
[[BR]]
`test_engine_client_dbg_2.cpp` is the source that fails (at line 364)
because, with no apparent reason, we find (sometimes) the pointer to
`mEngineClient` to be `NULL`. I've also verified that if in this code I
insert a `while` loop that waits until `mEngineClient.get() != nullptr`,
the code proceeds with no error (meaning that, at some point, that pointer
is reset to the correct value). But, I repeat, there is no reason why
`mEngineClient.get()` should be `NULL` in this point.[[BR]]
[[BR]]
`test_engine_client_dbg_4.cpp` is the source that works. Notice that this
time I'm not instantiating a second resolver within the `EngineClient`
class, but I'm simply passing the already-resolved endpoint iterator to
the `EngineClient`'s constructor. With this change, we never loose the
`mEngineClient` pointer.[[BR]]
[[BR]]
This race condition can be experienced only if we call `io_service.run()`
from multiple threads. I've verified that this does not happen if we call
a single `io_service.run()`.[[BR]]
[[BR]]
Also, I think it is difficult to be reproduced, because I've experienced
it only on one specific machine that, maybe, has a different timing with
respect to the others I have (because of its hardware). On this machine
(which has Fedora 23 OS, with kernel 4.7.3 and gcc 5.3.1) I've also tried
re-building by using gcc 4.8.5, but the issue is the same (so this is not
a compiler bug). I've also tried, on the same machine, with Fedora 24 OS
(yes ... I re-installed the OS) and so I was also able to test with gcc
6.1 and the issue comes out again. On other machines (I've tried also a
CentOS6, with 8 cores and gcc 4.8.5) I was not able to reproduce this
issue even running the tests thousands of times, while on the Fedora
machine where I'm experiencing this issue, it happens basically for sure
within 1000 runs.[[BR]]
[[BR]]
In summary, stress test is the only way to hope to experience this bug
(but sometimes it also comes out at the first run).
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/12474> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:20 UTC