Boost logo

Boost-Build :

From: K. Noel Belcourt (kbelco_at_[hidden])
Date: 2006-10-07 14:48:47


On Oct 7, 2006, at 12:34 PM, Vladimir Prus wrote:

> On Saturday 07 October 2006 20:46, K. Noel Belcourt wrote:
>> I've noticed some inconsistent behavior with the build system that I
>> want to show you. Our nightly regression builds some 30+
>> applications from some 100+ libraries and most nights a couple of
>> source files fails to produce an object file even though there's no
>> error message indicating problems. Note that we're writing to a
>> local disk (not NFS mounted) although we do read from NFS. From our
>> build log, we've found that bjam queues up a source file to compile
>> it, but there is no error message in the log file and no object file
>> is produced.
> ....
>> Note that if I just reissue the build command, these two files
>> compile fine, the library links and the executables link without
>> problem. I can envision build system and disk failures. We've
>> scanned the disk we're writing to and found no problems. The problem
>> surfaces randomly, some nights there's no failures with any toolset,
>> other nights, small numbers of files (1-5) fail for an irreproducible
>> reason.
>>
>> Any ideas?
>
> Is any -j option involved?

Yes, anywhere from -j2 to -j8. And I admit I've never seen the
problem with a serial build. I've also never seen the problem on
Sun, Ibm, Sgi or Linux with Pgi (5.2, 6.1) or with Intel compilers
(8.1, 9.0) but have seen the problem on Linux with gcc (3.4.3 and
4.0) and with Darwin gcc (4.0).

> I recall there was a bug where bjam would not even
> run a command for a file.
>
> However, you say that the command is issued, but does not produce a
> file, so
> it looks like a different problem.
>
> First step is to verify that bjam invokes the g++ command. Try running
> with -d+2 to show the command, just in case it will give a clue.
> Try to write
> a script called "g++" that forwards to real g++ but stores the
> command to a
> log files. If the real g++ is actually run but produces nothing,
> it's some
> g++/os issue.

My nightmare scenario. Our users will blame bjam for an undiagnosed
OS/g++ problem.

> Try making a wrapper script over g++ that run gcc as:
>
> strace -o unique-log-file -f g++ .......
>
> The script should also look at the value of the -o option from g++
> invocation
> and check if the file is created when zero error status is returned
> by g++.
> In such case, the log file produce by strace can be examined for
> any os-level
> issues.

Cool, I'll try that.

> One idea I have is that g++ dies with out-of-memory error, or some
> filesystem
> error, but zero status is somehow returned to bjam.

This is the scenario that worries me most.

Thanks for the help.

-- Noel


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk