I am having some issues with the boost::mpi::communicator object.

I am trying to use the constructor call for the communicator object which is supposed to be equivalent to MPI_Comm_Create and I am experiencing deadlock. The plan is to build an vector of communicators, where if I have p^2 processors aligned in a p x p grid, I will build a communicator for every row and column, but, within a row/column with p processors, I will build p-2 more communicators, corresponding to successively removing the leftmost processor, for example if I have a 4 x 4 grid, I want to have:
0 1 2 3 <-- one of these for every row/column
   1 2 3 <--  
      2 3 <--  these sub communicators, of course ignoring trivial ones. 

Originally i just had each processor building all of the groups it would belong to,
making sure that all processors build row communicators before column communicators, and that all processors in the same row/col built there communicators in the same order. This results in a strange deadlock. All processors enter the the correct constructor call, for example, processors 2,5,8 representing one column would attempt to build a sub communicator of the WORLD communicator, and they each have a group which lists 2,5,8, in the same sorted order, but they deadlock and never leave the call.

 When consulting the
manual page: http://www.open-mpi.org/doc/v1.6/man3/MPI_Comm_create.3.php
It is not clear if all processors in WORLD must execute this call or if only the ones within the group must execute it. When I try to have all processors execute such a call, I get a segmentation fault on the dtor call for the invalid communicator.

I'm hoping someone can explain to me how to do this properly.