Boost logo

Boost-Build :

From: Jurko Gospodnetiæ (jurko.gospodnetic_at_[hidden])
Date: 2008-01-20 15:38:36


   Hi.

> I am attaching a patch solving both of these problems so please review
> it and tell me if it is OK to go ahead and commit it. It uses no Windows
> APIs other then those already used so there should be little to no risk
> of backward compatibility problems with older OSs.

   Woops, a slight bug sneaked passed me in that last patch and it did
not collect final action output after the action terminated. Here's an
updated patch. Just needed to make output be processed after process
termination has been detected instead of before.

   Also noticed that the original version had another bug with a race
condition that could have caused the same effect as my first patch. It
would happen if bjam would collect available output and then the action
would produce some more output and terminate just before bjam detects
that the action terminated.

   Updated patch description:

-------------------------------------------------------------------

   Solved the problem with child process returning the value 259 causing
bjam never to detect that it exited and therefore keep running in an
endless loop. Done by relying on the Windows WaitForMultipleObjects()
function to detect when a process has exited instead of
GetExitCodeProcess(). The later function's MSDN article warns about this
problem.

   Solved the problem with bjam going into an active wait state, hogging
up processor resources, when waiting for one of its child processes to
terminate while not all of its available child process slots are being
used. To see this bug in action, try compiling a simple C++ program on a
2 processor PC with the -j 2 command-line option and watching how much
processor resources bjam uses while linking. Was caused by treating
unused process slots as used in the try_wait() function, causing the
WaitForMultipleObjects() Windows API call to exit instantly with an
error which was then getting incorrectly interpreted as a timeout,
starting the whole cycle anew.

   Solved a race condition between bjam's output reading/child process
termination detection and the child process's output
generation/termination which could have caused bjam not to collect the
terminated process's final output

   Extracted all GetExitCodeProcess() API calls into one location so it
no longer gets called in three separate places.

   Minor comment changes in code touched by previous fixes.

-------------------------------------------------------------------

   Hope this helps.

   Best regards,
     Jurko Gospodnetiæ

Index: execnt.c
===================================================================
--- execnt.c (revision 42883)
+++ execnt.c (working copy)
@@ -479,12 +479,12 @@
     /* wait for a command to complete, while snarfing up any output */
     do
     {
+ /* check for a complete command, briefly */
+ i = try_wait(500);
         /* read in the output of all running commands */
         read_output();
         /* close out pending debug style dialogs */
         close_alerts();
- /* check for a complete command, briefly */
- if ( i < 0 ) i = try_wait(500);
         /* check if a command ran out of time */
         if ( i < 0 ) i = try_kill_one();
     }
@@ -499,7 +499,7 @@
         /* the time data for the command */
         record_times(cmdtab[i].pi.hProcess, &time);
 
- /* Clear the temp file */
+ /* clear the temp file */
         if ( cmdtab[i].tempfile_bat )
         {
             unlink( cmdtab[ i ].tempfile_bat );
@@ -507,6 +507,9 @@
             cmdtab[i].tempfile_bat = NULL;
         }
 
+ /* find out the process exit code */
+ GetExitCodeProcess( cmdtab[i].pi.hProcess, &cmdtab[i].exitcode );
+
         /* the dispossition of the command */
         if( intr )
             rstat = EXEC_CMD_INTR;
@@ -884,43 +887,41 @@
     }
 }
 
-/* waits for a single child process command to complete, or the
- timeout, whichever is first. returns the index of the completed
- command, or -1. */
+/* waits for a single child process command to complete, or the timeout,
+ whichever comes first. returns the index of the completed command, or -1. */
 static int try_wait(int timeoutMillis)
 {
- int i, num_active, waiting;
+ int i;
+ int num_active;
+ int wait_api_result;
     HANDLE active_handles[MAXJOBS];
     int active_procs[MAXJOBS];
 
- for ( waiting = 1; waiting; )
+ /* prepare a list of all active processes to wait for */
+ for ( num_active = 0, i = 0; i < globs.jobs; ++i )
     {
- /* find the first completed child process */
- for ( num_active = 0, i = 0; i < globs.jobs; ++i )
+ if ( cmdtab[i].pi.hProcess )
         {
- /* if we have an already dead process, return it. */
- cmdtab[i].exitcode = 0;
- if ( GetExitCodeProcess( cmdtab[i].pi.hProcess, &cmdtab[i].exitcode ) )
- {
- if ( STILL_ACTIVE != cmdtab[i].exitcode )
- {
- return i;
- }
- }
- /* it's running, add it to the list to watch for */
             active_handles[num_active] = cmdtab[i].pi.hProcess;
             active_procs[num_active] = i;
- num_active += 1;
+ ++num_active;
         }
-
- /* wait for a child to complete, or for our timeout window to expire */
- if ( waiting )
- {
- WaitForMultipleObjects( num_active, active_handles, FALSE, timeoutMillis );
- waiting = 0;
- }
     }
-
+
+ /* wait for a child to complete, or for our timeout window to expire */
+ wait_api_result = WaitForMultipleObjects( num_active, active_handles,
+ FALSE, timeoutMillis );
+ if
+ (
+ ( WAIT_OBJECT_0 <= wait_api_result ) &&
+ ( wait_api_result < WAIT_OBJECT_0 + num_active )
+ )
+ {
+ /* terminated process detected - return its index */
+ return active_procs[ wait_api_result - WAIT_OBJECT_0 ];
+ }
+
+ /* timeout */
     return -1;
 }
 
@@ -941,9 +942,7 @@
                 close_alert(cmdtab[i].pi.hProcess);
                 /* we have a "runaway" job, kill it */
                 kill_process_tree(0,cmdtab[i].pi.hProcess);
- /* and return it as complete, with the failure code */
- GetExitCodeProcess( cmdtab[i].pi.hProcess, &cmdtab[i].exitcode );
- /* mark it as a timeout */
+ /* and return it marked as a timeout */
                 cmdtab[i].exit_reason = EXIT_TIMEOUT;
                 return i;
             }


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk