|
Boost Users : |
Subject: Re: [Boost-users] [EXTERNAL] bjam hangs on select (in develop branch)
From: Belcourt, Kenneth (kbelco_at_[hidden])
Date: 2014-10-24 20:14:09
On Oct 24, 2014, at 4:52 PM, Belcourt, Kenneth <kbelco_at_[hidden]> wrote:
>
> On Oct 24, 2014, at 4:43 PM, Belcourt, Kenneth <kbelco_at_[hidden]> wrote:
>
>> On Oct 24, 2014, at 4:33 PM, Belcourt, Kenneth <kbelco_at_[hidden]> wrote:
>>
>>> On Oct 24, 2014, at 7:56 AM, Alain Miniussi <alain.miniussi_at_oca.eu> wrote:
>>>
>>>> On 24/10/2014 15:33, Alain Miniussi wrote:
>>>>> I did a gnu/openmpi 1.8.2 build on ubuntu which exhibit the same problem.
>>>> It did not, just forgot to edit a field in project-config.jam. Only the intel mpiexec/run hangs.
>>>>>
>>>>> Can the fact that the setgpid system calls fails be an issue ?
>>>
>>> Perhaps. We make the forked child process its own process group leader so that if its an MPI job and it dies, all the MPI ranks are cleaned up as well. Weve been using this syntax for a number of years on multiple platforms without issues so Im a little surprised it fails on Ubuntu with OpenMPI 1.8.2 That said, its possible that theres a race condition that youre able to tickle.
>>>
>>> For example, we fork the child process and right before we exec the child process, we set the child process group. We also set the child process group in the parent process as well. Perhaps we should on do this once, not twice (i.e. only in the child or only in the parent, not both). Or perhaps theres a race if both the child and parent call to setpgid runs concurrently.
>>
>> Just pushed this commit, 7bcbc5ac31ab1, to develop which adds checks to the setpgid calls and, if they fail, indicates whether it was the parent or child process who called. Can you give this a try and let me know which call is failing?
>
> Well I be danged. I was just testing thie change on my Mac and found this in the output:
>
> setpgid (parent): Permission denied
>
> So it seems weve been ignoring this problem for some time and didnt know it. That would be my bad. Let me work on a fix (will probably remove the duplicate call in the parent process).
I left both setpgid checks in, but removed the call to exit() so well see the failed call to setpgid without killing b2.
commit 156bc5c42ec3 in develop.
Noel
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net