From: Matthew Hurd (matt_at_[hidden])
Date: 2003-03-24 17:59:38
> -----Original Message-----
> On Behalf Of Beman Dawes
> Sent: Tuesday, 25 March 2003 1:15 AM
> Be careful. At least with some older versions of Windows, the execution
> time for some of the Windows time related API's was so large that the
> useful resolution was nowhere near the apparent claimed resolution.
> If a function that is supposed to measure time in microseconds takes
> several milliseconds to execute, it seems to me the useful resolution is
> really milliseconds rather than microseconds.
Quite right. This was related to the QueryPerformanceCounter() using the
8254-compatible real-time clock which could take several thousand cycles.
The HAL of Pentium's and above should use Intel's RDTSC (Read Time Stamp
Counter) and not suffer this problem.
ACE's ACE_High_Res_Timer has more info if you'd like. Info below from the
header FYI. It suggests RDTSC takes 16 to 32 cycles, just add call overhead
and beware of multiprocessor issues. In fact ACE has support for a few
platforms you could pilfer due to its open license.
* @class ACE_High_Res_Timer
* @brief A high resolution timer class wrapper that encapsulates
* OS-specific high-resolution timers, such as those found on
* Solaris, AIX, Win32/Pentium, and VxWorks.
* Most of the member functions don't return values. The only
* reason that one would fail is if high-resolution time isn't
* supported on the platform. To avoid impacting performance
* and complicating the interface, in that case,
* <ACE_OS::gettimeofday> is used instead.
* The global scale factor is required for platforms that have
* high-resolution timers that return units other than
* microseconds, such as clock ticks. It is represented as a
* static u_long, can only be accessed through static methods,
* and is used by all instances of High Res Timer. The member
* functions that return or print times use the global scale
* factor. They divide the "time" that they get from
* <ACE_OS::gethrtime> by global_scale_factor_ to obtain the
* time in microseconds. Its units are therefore 1/microsecond.
* On Windows the global_scale_factor_ units are 1/millisecond.
* There's a macro <ACE_HR_SCALE_CONVERSION> which gives the
* units/second. Because it's possible that the units/second
* changes in the future, it's recommended to use it instead
* of a "hard coded" solution.
* Dependend on the platform and used class members, there's a
* maximum elapsed period before overflow (which is not checked).
* Look at the documentation with some members functions.
* On some (most?) implementations it's not recommended to measure
* "long" timeperiods, because the error's can accumulate fast.
* This is probably not a problem profiling code, but could be
* on if the high resolution timer class is used to initiate
* actions after a "long" timeout.
* On Solaris, a scale factor of 1000 should be used because its
* high-resolution timer returns nanoseconds. However, on Intel
* platforms, we use RDTSC which returns the number of clock
* ticks since system boot. For a 200MHz cpu, each clock tick
* is 1/200 of a microsecond; the global_scale_factor_ should
* therefore be 200 or 200000 if it's in unit/millisecond.
* On Windows ::QueryPerformanceCounter() is used, which can be a
* different implementation depending on the used windows HAL
* (Hardware Abstraction Layer). On some it uses the PC "timer chip"
* while it uses RDTSC on others.
* NOTE: the elapsed time calculations in the print methods use
* ACE_hrtime_t values. Those methods do _not_ check for overflow!
* NOTE: Gabe <begeddov_at_[hidden]> raises this issue regarding
* <ACE_OS::gethrtime>: on multi-processors, the processor that
* you query for your <timer.stop> value might not be the one
* you queried for <timer.start>. Its not clear how much
* divergence there would be, if any.
* This issue is not mentioned in the Solaris 2.5.1 gethrtime
* man page.
* A RDTSC NOTE: RDTSC is the Intel Pentium read-time stamp counter
* and is actualy a 64 bit clock cycle counter, which is increased
* with every cycle. It has a low overhead and can be read within
* 16 (pentium) or 32 (pentium II,III,...) cycles, but it doesn't
* serialize the processor, which could give wrong timings when
* profiling very short code fragments.
* Problematic is that some power sensitive devices
* (laptops for example, but probably also embedded devices),
* do change the cycle rate while running.
* Some pentiums can run on (at least) two clock frequency's.
* Another problem arises with multiprocessor computers, there
* are reports that the different RDTSC's are not always kept
* in sync.
* A windows "timer chip" NOTE: (8254-compatible real-time clock)
* When ::QueryPerformanceCounter() uses the 8254 it has a
* frequency off about 1.193 Mhz (or sometimes 3.579 Mhz?) and
* reading it requires some time (several thousand cycles).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk