[Boost-bugs] [Boost C++ Libraries] #9079: Condition variable will wait forever for some timepoint values.

Subject: [Boost-bugs] [Boost C++ Libraries] #9079: Condition variable will wait forever for some timepoint values.
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-09-03 13:45:32


#9079: Condition variable will wait forever for some timepoint values.
------------------------------+----------------------
 Reporter: mjklaim@… | Owner: anthonyw
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: thread
  Version: Boost 1.54.0 | Severity: Problem
 Keywords: |
------------------------------+----------------------
 Original discussion, including test cases and a quick fix:
 boost.2283326.n4.nabble.com/thread-Can-boost-this-thread-sleep-for-
 totally-stop-the-thread-td4650070.html

 Summary: In some conditions, reproduced in the following repro cases,
 boost::condition_variable::wait_until/wait_for will wait forever.


 ----------

 I'm using Boost 1.54, Visual Studio 2012 Update 3.

 Yesterday night I decided to try to build a repro case and now I think I
 see at least one of the problems but I need confirmation.
 I managed to build 2 repro cases at the end of this email, one very simple
 (A) and one a bit more complex and closer to what I actually do (B).

 Unfortunately, I am not sure if the problem in A, that I can understand,
 is
 really exactly the same as the problem in B.

 In A, the thread stops immediately, without even hitting the "TICK" log.
 Apparently the problem, even with the initial case which used
 this_thread::sleep_until, is related to condition_variable::wait_until.
 Basically, if the time given is 1 millisecond past the "now" timepoint
 used
 into the wait_until function, the thread will wait forever.

 Here is the implementation of wait_until in Boost 1.54:

 {{{

 template <class Clock, class Duration>
         cv_status
         wait_until(
                 unique_lock<mutex>& lock,
                 const chrono::time_point<Clock, Duration>& t)
         {
           using namespace chrono;
           do_wait(lock, ceil<milliseconds>(t-Clock::now()).count());
           return Clock::now() < t ? cv_status::no_timeout :
                                              cv_status::timeout;
         }

 }}}

 My current hypothesis is that the ceil() call in this particular case
 doesn't do what is intended.
 Here is the implementation of ceil():

 {{{

 /**
      * rounds up
      */
     template <class To, class Rep, class Period>
     To ceil(const duration<Rep, Period>& d)
     {
         To t = duration_cast<To>(d);
         if (t < d)
             ++t;
         return t;
     }

 }}}

 Assuming the call to Clock::now() in wait_until() gives exactly t + 1ms,
 then ceil, if I'm correct will return 0.
 The call do_wait( 0 ) seems to wait forever, but I don't understand the
 details of the implementation.
 The same exact problem occurs with wait_for() (by the way, why does it
 always have to use steady_clock? see the implementation)

 Suggestion: add a check so that do_wait() returns immediately if the given
 time is <= 0.
 I'm assuming that the time given to the wait_until() call is always based
 on int-kind of value... if that's not the case, it might be harder to get
 right.


 However, in B, it is a bit more complex: the endless wait don't occur
 immediately but I managed to make it occur very
 quickly in my current setup at least.
 It seems that the problem is similar, but I couldnt' reproduce the
 behaviour of B in A by modifying the time to wait until, only (now - 1ms)
 seems to work.

 In B, we first acquire the current time point, then do some work, then
 call
 wait_until().
 I believe it is certainly a variation of the bug in A but this is what
 happen in my application.
 It takes much more time to trigger the endless wait in my application but
 you can tweak the values to get
 the same result, for example by setting MIN_TIME_TASKS_SPAWN_FREQUENCY to
 125 milliseconds (it's the value in my app).

 Basically, the problem seems to be that I introduce delay by working
 between the acquisition of the timepoint and the wait_until() call,
 which makes possible to have a duration value calculated into wait_until
 which would be ceild to 0, which would trigger the endless wait.

 First, please tell me if you can reproduce the problem.
 Second, I'll try to see if adding the (duration <= 0) check in do_wait()
 does fix the problem for me.
 I can reproduce it easily now so I should be able to figure it out today.


 ------------------------------

 {{{

 // A

 #include <boost/log/trivial.hpp>
 #include <boost/chrono.hpp>
 #include <boost/thread.hpp>
 #include <boost/thread/condition_variable.hpp>

 typedef boost::chrono::high_resolution_clock Clock;
 typedef Clock::time_point TimePoint;

 inline TimePoint real_time_now() { return Clock::now(); }

 int main()
 {
 using namespace boost::chrono;

 boost::condition_variable m_task_spawn_condition;

 boost::mutex main_thread_mutex;
 boost::unique_lock<boost::mutex> main_thread_lock( main_thread_mutex );

 BOOST_LOG_TRIVIAL(info) << "[TaskScheduler::run_and_wait] Scheduling loop
 -
 BEGIN";

 while( true )
 {
 static const auto TIME_BACK = milliseconds(1);
 m_task_spawn_condition.wait_until( main_thread_lock, real_time_now() -
 TIME_BACK ); // wait forever
 //m_task_spawn_condition.wait_for( main_thread_lock, - TIME_BACK ); //
 same
 problem
 BOOST_LOG_TRIVIAL(trace) << "TICK";
 }

 }

 }}}

 ------------------------------

 {{{

 // B

 #include <atomic>
 #include <boost/log/trivial.hpp>
 #include <boost/chrono.hpp>
 #include <boost/thread.hpp>
 #include <boost/thread/condition_variable.hpp>

 typedef boost::chrono::high_resolution_clock Clock;
 typedef Clock::time_point TimePoint;

 inline TimePoint real_time_now() { return Clock::now(); }


 class Foo
 {
 std::atomic<bool> m_is_exiting;
 TimePoint m_next_tick_time;

 public:

 bool is_exiting() const { return m_is_exiting; }

 TimePoint spawn_tasks() // note that in my app, this call takes more time
 than here
 {
 using namespace boost::chrono;
 const auto now = real_time_now();

 if( m_next_tick_time < now )
 {
 m_next_tick_time = now + seconds(1);
 BOOST_LOG_TRIVIAL(info) << "TICK!";
 }

 return m_next_tick_time;
 }

 };


 int main()
 {
 using namespace boost::chrono;
 static const auto MIN_TIME_TASKS_SPAWN_FREQUENCY = milliseconds(1);
 //microseconds(1); // THE SHORTER THE QUICKER TO REPRODUCE THE BUG

 boost::condition_variable m_task_spawn_condition;
 Foo foo;

 boost::mutex main_thread_mutex;
 boost::unique_lock<boost::mutex> main_thread_lock( main_thread_mutex );

 BOOST_LOG_TRIVIAL(info) << "[TaskScheduler::run_and_wait] Scheduling loop
 -
 BEGIN";

 while( !foo.is_exiting() )
 {
 const auto next_task_spawn_time = foo.spawn_tasks();

 const auto now = real_time_now();
 const auto next_minimum_spawn_time = now + MIN_TIME_TASKS_SPAWN_FREQUENCY;
  const auto next_spawn_time =
 next_task_spawn_time > TimePoint() && next_task_spawn_time <
 next_minimum_spawn_time ?
 next_task_spawn_time : next_minimum_spawn_time;

 const auto wait_time = next_spawn_time - now;
 if( wait_time > wait_time.zero() )
 {
 BOOST_LOG_TRIVIAL(trace) << "WAIT TIME: " << wait_time; // UNCOMMENT THIS:
 MAKES IT WORKS. WAT??????
 auto reason = m_task_spawn_condition.wait_until( main_thread_lock,
 next_spawn_time ); // DON'T WORK: WILL WAIT IF next_spawn_time is too
 close!
 }

 }


 }

 }}}


 ---------

 Diving into the code, I just realize the real source of the problem:


 {{{
  struct BOOST_SYMBOL_VISIBLE timeout
         {
             win32::ticks_type start;
             uintmax_t milliseconds; // HERE
             bool relative;
             boost::system_time abs_time;

             static unsigned long const max_non_infinite_wait=0xfffffffe;

             timeout(uintmax_t milliseconds_): // HERE

 }}}

 do_wait() takses a timeout which will use a unsined int as milliseconds,
 which will make any negative number very big.
 Assuming ceil() returns a negative number (I might have made a mistake in
 my precedent mail), then we get a very big timeout....

 ----------

 I tried replacing uintmax_t by long long in this quoted code, as a quick-
 fix, and it seems to work in my repro-cases.

 ...

 It fixes my application too.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/9079>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:14 UTC