|
Boost : |
Subject: [boost] [thread] Timed waits in Boost.Thread potentially fundamentally broken on Windows (possibly rest of Boost too)
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2015-01-23 07:08:38
Dear all,
CC: Stephan @ Microsoft - Stephan I'd love to know what the MSVC STL
does below so we have the option of matching your behaviour.
During investigating this bug report for Boost.Thread
(https://svn.boost.org/trac/boost/ticket/9856) I have discovered a
very worrying situation: it would appear that potentially all timed
waits in Boost.Thread, and potentially in other parts of Boost, are
broken on Windows Vista and later and have been for some years.
The problem is in correct handling of timeouts. If one does this:
mutex mtx;
condition_variable cond;
unique_lock<mutex> lk(mtx);
assert(cv_status::timeout == cond.wait_for(lk, chrono::seconds(1)));
... one would reasonably expect occasional failures on POSIX due to
spurious wakeups. It turns out that this also spuriously fails on
Windows, which is a surprise probably to many as Windows hides signal
handling (actually APCs) inside its Win32 APIs and automatically
restarts the operation after interruption. There is, therefore, the
potential that quite a lot of code written to use Boost.Thread on
Windows makes the hard assumption that the assert above will never
fail.
The reason why Windows spuriously fails above isn't due to spurious
wakeups, it is in fact due to changes in the Vista kernel scheduler
as documented at
https://technet.microsoft.com/en-us/magazine/2007.02.vistakernel.aspx.
In essence, if you now ask Windows to go sleep for X milliseconds,
Windows Vista onwards will in fact sleep for anywhere between zero
and X+N milliseconds where N is some arbitrarily long value. In other
words, timeouts in Windows are purely advisory, and are freely
ignored by the Windows kernel from Vista onwards. You can test this
for yourself using this little program which reduces the #9856 bug
report to its Win32 API essentials:
#include <windows.h>
#include <stdio.h>
#include <chrono>
int main(void)
{
ULONG ulDelay_ms = 20;
HANDLE hSemaphoreDelay = CreateSemaphore(NULL, 0, 1, NULL);
for (size_t n = 0; n < 50; n++)
{
while (1) {
auto begin = std::chrono::high_resolution_clock::now();
ULONG hr = WaitForSingleObject(hSemaphoreDelay, ulDelay_ms);
auto end = std::chrono::high_resolution_clock::now();
auto diff = end - begin;
if (hr == WAIT_ABANDONED)
printf("Wait Abandoned ");
else if (hr == WAIT_TIMEOUT)
printf("Timed out ");
else
printf("Signaled ");
DWORD lTDelta =
std::chrono::duration_cast<std::chrono::milliseconds>(diff).count();
printf("Target Wait Interval: %u Real Wait Interval: %u
(%u)\n", ulDelay_ms, lTDelta, diff.count());
if (lTDelta >= ulDelay_ms) break;
}
printf("\n");
}
CloseHandle(hSemaphoreDelay);
return 0;
}
As you'll see, actual time waited is anywhere between zero and 20 + N
milliseconds where actual time waited is sometimes a whole integer
multiple of the Windows kernel granularity (15 ms) or some fraction
of that granularity. Most of the time he tries to hit what you asked
for, but you get no guarantees.
More detail about how and why Windows Vista onwards does this can be
found at
http://forum.sysinternals.com/bug-in-waitable-timers_topic16229.html.
This raises the question about what to do with Boost.Thread. We have
the following options:
Option 1: Timed waits are allowed to spuriously fail by the standard,
so we mark this as wontfix and move on. Anyone using the predicate
timed waits has never seen a problem here anyway.
Option 2: We loop waiting until steady_clock (really
QueryPerformanceCounter under Boost) shows the requested timeout has
passed. Problem: This wastes battery power and generates needless
wakeups. A more intelligent implementation would ask Windows for the
thread quanta and transform timeouts to match the Vista kernel
scheduler in combination with always using deadline scheduling, but
this would slow down the timed waits implementation.
Option 3: We adjust Boost.Thread to return timeouts when Windows
returns a timed out status code, even if the actual time waited is
considerably lower than the time requested. Problem: some code
written for POSIX where when you ask for a timeout you always get it
may misbehave in this situation.
Boost commmunity, I turn it over to you for advice!
Niall
-- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk