[Boost-bugs] [Boost C++ Libraries] #12474: Using two different resolver instances on the same io_service causes a race condition

Subject: [Boost-bugs] [Boost C++ Libraries] #12474: Using two different resolver instances on the same io_service causes a race condition
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2016-09-21 09:47:04


#12474: Using two different resolver instances on the same io_service causes a race
condition
---------------------------------------+----------------------------
 Reporter: michele.de.stefano@… | Owner: chris_kohlhoff
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: asio
  Version: Boost 1.61.0 | Severity: Problem
 Keywords: race condition, data race |
---------------------------------------+----------------------------
 Dear developer (or developers) of Boost Asio,[[BR]]
 [[BR]]
 I think I've found a bug in Asio. If I instantiate two different TCP
 resolvers on the same io_service and I use these two distinct resolvers
 for resolving the same endpoint, a race condition is generated.[[BR]]
 [[BR]]
 I have attached two sources (C++11 is required) that reproduce the issue.
 Unfortunately you need to run the programs many times to experience a
 failure ... I have bash scripts that allow me to run these programs
 thousands of times, each time using a different, random, free port on the
 local host.[[BR]]
 [[BR]]
 Quick instructions for running the tests:[[BR]]
 [[BR]]
 `test_engine_client_dbg_x -s -p <port>`[[BR]]
 [[BR]]
 runs the server.[[BR]]
 [[BR]]
 `test_engine_client_dbg_x -c -p <port>`[[BR]]
 [[BR]]
 Runs the client.[[BR]]
 [[BR]]
 `test_engine_client_dbg_2.cpp` is the source that fails (at line 364)
 because, with no apparent reason, we find (sometimes) the pointer to
 `mEngineClient` to be `NULL`. I've also verified that if in this code I
 insert a `while` loop that waits until `mEngineClient.get() != nullptr`,
 the code proceeds with no error (meaning that, at some point, that pointer
 is reset to the correct value). But, I repeat, there is no reason why
 `mEngineClient.get()` should be `NULL` in this point.[[BR]]
 [[BR]]
 `test_engine_client_dbg_4.cpp` is the source that works. Notice that this
 time I'm not instantiating a second resolver within the `EngineClient`
 class, but I'm simply passing the already-resolved endpoint iterator to
 the `EngineClient`'s constructor. With this change, we never loose the
 `mEngineClient` pointer.[[BR]]
 [[BR]]
 This race condition can be experienced only if we call `io_service.run()`
 from multiple threads. I've verified that this does not happen if we call
 a single `io_service.run()`.[[BR]]
 [[BR]]
 Also, I think it is difficult to be reproduced, because I've experienced
 it only on one specific machine that, maybe, has a different timing with
 respect to the others I have (because of its hardware). On this machine
 (which has Fedora 23 OS, with kernel 4.7.3 and gcc 5.3.1) I've also tried
 re-building by using gcc 4.8.5, but the issue is the same (so this is not
 a compiler bug). I've also tried, on the same machine, with Fedora 24 OS
 (yes ... I re-installed the OS) and so I was also able to test with gcc
 6.1 and the issue comes out again. On other machines (I've tried also a
 CentOS6, with 8 cores and gcc 4.8.5) I was not able to reproduce this
 issue even running the tests thousands of times, while on the Fedora
 machine where I'm experiencing this issue, it happens basically for sure
 within 1000 runs.[[BR]]
 [[BR]]
 In summary, stress test is the only way to hope to experience this bug
 (but sometimes it also comes out at the first run).

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/12474>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:20 UTC