Hi,

I apologize if this is bad etiquette, but I posted this first on StackOverflow at:  http://stackoverflow.com/questions/3103255/boostinterprocess-shared-memory-bus-error

I'm using CentOS 5.4 x86_64 and Boost 1.42.0 on a cluster that uses Open-MPI 1.3.3. I'm writing a shared library that uses shared memory to store large amounts of data for multiple processes to use. There's also a loader application that will read in the data from the files and load them into the shared memory.

When I run the loader application, it determines the amount of memory that it needs to store the data exactly then adds 25% for overhead. For just about every file, it'll be over 2 gigs worth of data. When I make the memory request using Boost.Interprocess, it says it has successfully reserved the requested amount of memory. But when I use start to use it, I get a "Bus error". From what I can tell, the bus error is a result of accessing memory outside the range that is available for the memory segment.

So I started looking into how the shared memory is on Linux and what to check to make sure my system is correctly configured to allow that large amount of shared memory.

  1. I looked at the "files" at /proc/sys/kernel/shm*:
  2. I called the ipcs -lm command:
    ------ Shared Memory Limits --------
    max number of segments = 4096
    max seg size (kbytes) = 67108864
    max total shared memory (kbytes) = 17179869184
    min seg size (bytes) = 1

From what I can tell, those settings indicate that I should be able to allocate enough shared memory for my purposes. So I created a stripped down program that created large amounts of data in shared memory:

#include <iostream>

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>

namespace bip = boost::interprocess;

typedef bip::managed_shared_memory::segment_manager segment_manager_t;
typedef bip::allocator<long, segment_manager_t> long_allocator;
typedef bip::vector<long, long_allocator> long_vector;

int main(int argc, char ** argv) {
   
struct shm_remove  {
        shm_remove
()    { bip::shared_memory_object::remove("ShmTest"); }
       
~shm_remove()   { bip::shared_memory_object::remove("ShmTest"); }
   
} remover;

    size_t szLength
= 280000000;
    size_t szRequired
= szLength * sizeof(long);
    size_t szRequested
= (size_t) (szRequired * 1.05);
    bip
::managed_shared_memory segment(bip::create_only, "ShmTest", szRequested);

    std
::cout <<
       
"Length:       " << szLength << "\n" <<
       
"sizeof(long): " << sizeof(long) << "\n" <<
       
"Required:     " << szRequired << "\n" <<
       
"Requested:    " << szRequested << "\n" <<
       
"Allocated:    " << segment.get_size() << "\n" <<
       
"Overhead:     " << segment.get_size() - segment.get_free_memory() << "\n" <<
       
"Free:         " << segment.get_free_memory() << "\n\n";

    long_allocator alloc
(segment.get_segment_manager());
    long_vector vector
(alloc);

   
if (argc > 1) {
        std
::cout << "Reserving Length of " << szLength << "\n";
        vector
.reserve(szLength);
        std
::cout << "Vector Capacity: " << vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n\n";
   
}

   
for (size_t i = 0; i < szLength; i++) {
       
if ((i % (szLength / 100)) == 0) {
            std
::cout << i << ": " << "\tVector Capacity: " << vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n";
       
}
        vector
.push_back(i);    
   
}
    std
::cout << "end: " << "\tVector Capacity: " << vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n";

   
return 0;
}

Hi,

I'm using CentOS 5.4 x86_64 and Boost 1.42.0 on a cluster that uses Open-MPI 1.3.3. I'm writing a shared library that uses shared memory to store large amounts of data for multiple processes to use. There's also a loader application that will read in the data from the files and load them into the shared memory.

When I run the loader application, it determines the amount of memory that it needs to store the data exactly then adds 25% for overhead. For just about every file, it'll be over 2 gigs worth of data. When I make the memory request using Boost's Interprocess library, it says it has successfully reserved the requested amount of memory. But when I use start to use it, I get a "Bus error". From what I can tell, the bus error is a result of accessing memory outside the range that is available for the memory segment.

So I started looking into how the shared memory is on Linux and what to check to make sure my system is correctly configured to allow that large amount of shared memory.

  1. I looked at the "files" at /proc/sys/kernel/shm*:
  2. I called the ipcs -lm command:
    ------ Shared Memory Limits --------
    max number of segments = 4096
    max seg size (kbytes) = 67108864
    max total shared memory (kbytes) = 17179869184
    min seg size (bytes) = 1

From what I can tell, those settings indicate that I should be able to allocate enough shared memory for my purposes. So I created a stripped down program that created large amounts of data in shared memory:


#include <iostream>

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>

namespace bip = boost::interprocess;

typedef bip::managed_shared_memory::segment_manager segment_manager_t;
typedef bip::allocator<long, segment_manager_t> long_allocator;
typedef bip::vector<long, long_allocator> long_vector;

int main(int argc, char ** argv) {
   
struct shm_remove  {
        shm_remove
()    { bip::shared_memory_object::remove("ShmTest"); }
       
~shm_remove()   { bip::shared_memory_object::remove("ShmTest"); }
   
} remover;

    size_t szLength
= 280000000;
    size_t szRequired
= szLength * sizeof(long);
    size_t szRequested
= (size_t) (szRequired * 1.05);
    bip
::managed_shared_memory segment(bip::create_only, "ShmTest", szRequested);

    std
::cout <<
       
"Length:       " << szLength << "\n" <<
       
"sizeof(long): " << sizeof(long) << "\n" <<
       
"Required:     " << szRequired << "\n" <<
       
"Requested:    " << szRequested << "\n" <<
       
"Allocated:    " << segment.get_size() << "\n" <<
       
"Overhead:     " << segment.get_size() - segment.get_free_memory() << "\n" <<
       
"Free:         " << segment.get_free_memory() << "\n\n";

    long_allocator alloc
(segment.get_segment_manager());
    long_vector vector
(alloc);

   
if (argc > 1) {
        std
::cout << "Reserving Length of " << szLength << "\n";
        vector
.reserve(szLength);
        std
::cout << "Vector Capacity: " << vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n\n";
   
}

   
for (size_t i = 0; i < szLength; i++) {
       
if ((i % (szLength / 100)) == 0) {
            std
::cout << i << ": " << "\tVector Capacity: " << vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n";
       
}
        vector
.push_back(i);    
   
}
    std
::cout << "end: " << "\tVector Capacity: " << vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n";

   
return 0;
}

Compiled it with the line:

g++ ShmTest.cpp -lboost_system -lrt

Then ran it with the following output (edited to make it smaller):

Length:       280000000
sizeof(long): 8
Required: 2240000000
Requested: 2352000000
Allocated: 2352000000
Overhead: 224
Free: 2351999776

0: Vector Capacity: 0 Free: 2351999776
2800000: Vector Capacity: 3343205 Free: 2325254128
5600000: Vector Capacity: 8558607 Free: 2283530912
8400000: Vector Capacity: 8558607 Free: 2283530912
11200000: Vector Capacity: 13693771 Free: 2242449600
14000000: Vector Capacity: 21910035 Free: 2176719488
...
19600000: Vector Capacity: 21910035 Free: 2176719488
22400000: Vector Capacity: 35056057 Free: 2071551312
...
33600000: Vector Capacity: 35056057 Free: 2071551312
36400000: Vector Capacity: 56089691 Free: 1903282240
...
56000000: Vector Capacity: 56089691 Free: 1903282240
58800000: Vector Capacity: 89743507 Free: 1634051712
...
89600000: Vector Capacity: 89743507 Free: 1634051712
92400000: Vector Capacity: 143589611 Free: 1203282880
...
142800000: Vector Capacity: 143589611 Free: 1203282880
145600000: Vector Capacity: 215384417 Free: 628924432
...
212800000: Vector Capacity: 215384417 Free: 628924432
215600000: Vector Capacity: 293999969 Free: 16
...
260400000: Vector Capacity: 293999969 Free: 16
Bus error

If you run the program with the a parameter (any will work, just need to increase the argc), it preallocate the vector but will still result in a bus error at the same array index.

I checked the size of the "files" at /dev/shm using the ls -ash /dev/shm command:

total 2.0G
0 . 0 .. 2.0G ShmTest

And just like with my original application it the size of the allocated shared memory is capped at 2 gigs. Given that it "successfully" allocated 2352000000 bytes of memory, in gigabytes (using 1024*1024*1024) it should be 2.19 Gb.

When I run my actual program to load data using MPI, I get this error output:

Requested: 2808771120
Recieved: 2808771120

[c1-master:13894] *** Process received signal ***
[c1-master:13894] Signal: Bus error (7)
[c1-master:13894] Signal code: (2)
[c1-master:13894] Failing at address: 0x2b3190157000
[c1-master:13894] [ 0] /lib64/libpthread.so.0 [0x3a64e0e7c0]
[c1-master:13894] [ 1] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess26uninitialized_copy_or_moveINS0_10offset_ptrIlEEPlEET0_T_S6_S5_PNS_10disable_ifINS0_11move_detail16is_move_iteratorIS6_EEvE4typeE+0x218) [0x2b310dcf3fb8]
[c1-master:13894] [ 2] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost9container6vectorIlNS_12interprocess9allocatorIlNS2_15segment_managerIcNS2_15rbtree_best_fitINS2_12mutex_familyENS2_10offset_ptrIvEELm0EEENS2_10iset_indexEEEEEE15priv_assign_auxINS7_IlEEEEvT_SG_St20forward_iterator_tag+0xa75) [0x2b310dd0a335]
[c1-master:13894] [ 3] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost9container17containers_detail25advanced_insert_aux_proxyINS0_6vectorIlNS_12interprocess9allocatorIlNS4_15segment_managerIcNS4_15rbtree_best_fitINS4_12mutex_familyENS4_10offset_ptrIvEELm0EEENS4_10iset_indexEEEEEEENS0_17constant_iteratorISF_lEEPSF_E25uninitialized_copy_all_toESI_+0x1d7) [0x2b310dd0b817]
[c1-master:13894] [ 4] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost9container6vectorINS1_IlNS_12interprocess9allocatorIlNS2_15segment_managerIcNS2_15rbtree_best_fitINS2_12mutex_familyENS2_10offset_ptrIvEELm0EEENS2_10iset_indexEEEEEEENS3_ISD_SB_EEE17priv_range_insertENS7_ISD_EEmRNS0_17containers_detail23advanced_insert_aux_intISD_PSD_EE+0x771) [0x2b310dd0d521]
[c1-master:13894] [ 5] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess6detail8Ctor3ArgINS_9container6vectorINS4_IlNS0_9allocatorIlNS0_15segment_managerIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvEELm0EEENS0_10iset_indexEEEEEEENS5_ISF_SD_EEEELb0EiSF_NS5_IvSD_EEE11construct_nEPvmRm+0x157) [0x2b310dd0d9a7]
[c1-master:13894] [ 6] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess15segment_managerIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvEELm0EEENS0_10iset_indexEE28priv_generic_named_constructIcEEPvmPKT_mbbRNS0_6detail18in_place_interfaceERNS7_INSE_12index_configISB_S6_EEEENSE_5bool_ILb1EEE+0x6fd) [0x2b310dd0c85d]
[c1-master:13894] [ 7] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess15segment_managerIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvEELm0EEENS0_10iset_indexEE22priv_generic_constructEPKcmbbRNS0_6detail18in_place_interfaceE+0xf8) [0x2b310dd0dd58]
[c1-master:13894] [ 8] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN7POP_LTL16ExportPopulation22InitializeSharedMemoryEPKc+0x1609) [0x2b310dceea99]
[c1-master:13894] [ 9] ../LookupPopulationLib/Release/libLookupPopulation.so(_ZN7POP_LTL10InitializeEPKc+0x349) [0x2b310dd0ebb9]
[c1-master:13894] [10] MPI_Release/LookupPopulation.MpiLoader(main+0x372) [0x4205d2]
[c1-master:13894] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3a6461d994]
[c1-master:13894] [12] MPI_Release/LookupPopulation.MpiLoader(__gxx_personality_v0+0x239) [0x420009]
[c1-master:13894] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 13894 on node c1-master exited on signal 7 (Bus error).
--------------------------------------------------------------------------

I'm really not sure where to go with this. Does anyone have any suggestions of what to try?

Thank you!
Matt Cupp