Boost logo

Boost-Commit :

From: jurko.gospodnetic_at_[hidden]
Date: 2008-04-07 05:55:19


Author: jurko
Date: 2008-04-07 05:55:18 EDT (Mon, 07 Apr 2008)
New Revision: 44087
URL: http://svn.boost.org/trac/boost/changeset/44087

Log:
  Solved the problem with child process returning the value 259 (Windows constant STILL_ACTIVE) causing bjam never to detect that it exited and therefore keep running in an endless loop. Done by relying on the Windows WaitForMultipleObjects() function to detect when a process has exited instead of GetExitCodeProcess(). The later function's MSDN article (http://msdn2.microsoft.com/en-us/library/ms683189(VS.85).aspx) warns about this problem.

  Solved the problem with bjam going into an active wait state, hogging up processor resources, when waiting for one of its child processes to terminate while not all of its available child process slots are being used. To see this bug in action, try compiling a simple C++ program on a 2 processor PC with the -j 2 command-line option and watching how much processor resources bjam uses while linking. Was caused by treating unused process slots as used in the try_wait() function, causing the WaitForMultipleObjects() Windows API call to exit instantly with an error which was then getting incorrectly interpreted as a timeout, starting the whole cycle anew.

  Solved a race condition between bjam's output reading/child process termination detection and the child process's output generation/termination which could have caused bjam not to collect the terminated process's final output.

  Extracted all GetExitCodeProcess() API calls into one location so it no longer gets called in three separate places.

  Minor comment changes in code touched by previous fixes.

Text files modified:
   trunk/tools/jam/src/execnt.c | 61 +++++++++++++++++++--------------------
   1 files changed, 29 insertions(+), 32 deletions(-)

Modified: trunk/tools/jam/src/execnt.c
==============================================================================
--- trunk/tools/jam/src/execnt.c (original)
+++ trunk/tools/jam/src/execnt.c 2008-04-07 05:55:18 EDT (Mon, 07 Apr 2008)
@@ -479,12 +479,12 @@
     /* wait for a command to complete, while snarfing up any output */
     do
     {
+ /* check for a complete command, briefly */
+ i = try_wait(500);
         /* read in the output of all running commands */
         read_output();
         /* close out pending debug style dialogs */
         close_alerts();
- /* check for a complete command, briefly */
- if ( i < 0 ) i = try_wait(500);
         /* check if a command ran out of time */
         if ( i < 0 ) i = try_kill_one();
     }
@@ -499,7 +499,7 @@
         /* the time data for the command */
         record_times(cmdtab[i].pi.hProcess, &time);
 
- /* Clear the temp file */
+ /* clear the temp file */
         if ( cmdtab[i].tempfile_bat )
         {
             unlink( cmdtab[ i ].tempfile_bat );
@@ -507,6 +507,9 @@
             cmdtab[i].tempfile_bat = NULL;
         }
 
+ /* find out the process exit code */
+ GetExitCodeProcess( cmdtab[i].pi.hProcess, &cmdtab[i].exitcode );
+
         /* the dispossition of the command */
         if( intr )
             rstat = EXEC_CMD_INTR;
@@ -884,43 +887,39 @@
     }
 }
 
-/* waits for a single child process command to complete, or the
- timeout, whichever is first. returns the index of the completed
- command, or -1. */
+/* Waits for a single child process command to complete, or the timeout,
+ whichever comes first. Returns the index of the completed command in the
+ cmdtab array, or -1. */
 static int try_wait(int timeoutMillis)
 {
- int i, num_active, waiting;
+ int i;
+ int num_active;
+ int wait_api_result;
     HANDLE active_handles[MAXJOBS];
     int active_procs[MAXJOBS];
 
- for ( waiting = 1; waiting; )
+ /* prepare a list of all active processes to wait for */
+ for ( num_active = 0, i = 0; i < globs.jobs; ++i )
     {
- /* find the first completed child process */
- for ( num_active = 0, i = 0; i < globs.jobs; ++i )
+ if ( cmdtab[i].pi.hProcess )
         {
- /* if we have an already dead process, return it. */
- cmdtab[i].exitcode = 0;
- if ( GetExitCodeProcess( cmdtab[i].pi.hProcess, &cmdtab[i].exitcode ) )
- {
- if ( STILL_ACTIVE != cmdtab[i].exitcode )
- {
- return i;
- }
- }
- /* it's running, add it to the list to watch for */
             active_handles[num_active] = cmdtab[i].pi.hProcess;
             active_procs[num_active] = i;
- num_active += 1;
- }
-
- /* wait for a child to complete, or for our timeout window to expire */
- if ( waiting )
- {
- WaitForMultipleObjects( num_active, active_handles, FALSE, timeoutMillis );
- waiting = 0;
+ ++num_active;
         }
     }
-
+
+ /* wait for a child to complete, or for our timeout window to expire */
+ wait_api_result = WaitForMultipleObjects( num_active, active_handles,
+ FALSE, timeoutMillis );
+ if ( ( WAIT_OBJECT_0 <= wait_api_result ) &&
+ ( wait_api_result < WAIT_OBJECT_0 + num_active ) )
+ {
+ /* terminated process detected - return its index */
+ return active_procs[ wait_api_result - WAIT_OBJECT_0 ];
+ }
+
+ /* timeout */
     return -1;
 }
 
@@ -941,9 +940,7 @@
                 close_alert(cmdtab[i].pi.hProcess);
                 /* we have a "runaway" job, kill it */
                 kill_process_tree(0,cmdtab[i].pi.hProcess);
- /* and return it as complete, with the failure code */
- GetExitCodeProcess( cmdtab[i].pi.hProcess, &cmdtab[i].exitcode );
- /* mark it as a timeout */
+ /* and return it marked as a timeout */
                 cmdtab[i].exit_reason = EXIT_TIMEOUT;
                 return i;
             }


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk