Boost logo

Boost Users :

Subject: Re: [Boost-users] [EXTERNAL] bjam hangs on select (in develop branch)
From: Belcourt, Kenneth (kbelco_at_[hidden])
Date: 2014-10-24 18:33:51


Hi Alian,

On Oct 24, 2014, at 7:56 AM, Alain Miniussi <alain.miniussi_at_oca.eu> wrote:

> On 24/10/2014 15:33, Alain Miniussi wrote:
>> I did a gnu/openmpi 1.8.2 build on ubuntu which exhibit the same problem.
> It did not, just forgot to edit a field in project-config.jam. Only the intel mpiexec/run hangs.
>>
>> Can the fact that the setgpid system calls fails be an issue ?

Perhaps. We make the forked child process it’s own process group leader so that if it’s an MPI job and it dies, all the MPI ranks are cleaned up as well. We’ve been using this syntax for a number of years on multiple platforms without issues so I’m a little surprised it fails on Ubuntu with OpenMPI 1.8.2 That said, it’s possible that there’s a race condition that you’re able to tickle.

For example, we fork the child process and right before we exec the child process, we set the child process group. We also set the child process group in the parent process as well. Perhaps we should on do this once, not twice (i.e. only in the child or only in the parent, not both). Or perhaps there’s a race if both the child and parent call to setpgid runs concurrently.

I’m still looking at this.

— Noel

>> I notice they are among the few sys call those return code is not tested (under gdb, I noticed they return 13 (PERM issue)).
>>
>> Alain
>>
>> On 21/10/2014 15:56, Belcourt, Kenneth wrote:
>>> On Oct 21, 2014, at 6:35 AM, Alain Miniussi <alain.miniussi_at_oca.eu> wrote:
>>>
>>>> Sorry, the problem is still here:
>>>> 6817 alainm 20 0 S 0.0 0:01.74 4517 b2
>>>> 6870 alainm 20 0 T 0.0 0:00.00 6817 sh
>>>> 6871 alainm 20 0 T 0.0 0:00.00 6870 mpirun
>>>> 6876 alainm 20 0 Z 0.0 0:00.00 6871 mpiexec.hydra <defunct>
>>>>
>>>> bottom of b2 strace:
>>>>
>>>> lseek(4, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
>>>> rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
>>>> select(5, [4], NULL, NULL, NULL
>>> Okay, that’s helpful. Let me try a couple of other things. Thanks Alain.
>>>
>>> — Noel
>>>
>>> _______________________________________________
>>> Boost-users mailing list
>>> Boost-users_at_[hidden]
>>> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>>
>>
>
>
> --
> ---
> Alain
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net