Boost logo

Boost Users :

Subject: [Boost-users] [Interprocess] Shared Memory Bus Error
From: Matt Cupp (cuppmatt_at_[hidden])
Date: 2010-06-23 12:13:31


Hi,

I apologize if this is bad etiquette, but I posted this first on
StackOverflow at:
http://stackoverflow.com/questions/3103255/boostinterprocess-shared-memory-bus-error

I'm using CentOS 5.4 x86_64 and Boost 1.42.0 on a cluster that uses Open-MPI
1.3.3. I'm writing a shared library that uses shared memory to store large
amounts of data for multiple processes to use. There's also a loader
application that will read in the data from the files and load them into the
shared memory.

When I run the loader application, it determines the amount of memory that
it needs to store the data exactly then adds 25% for overhead. For just
about every file, it'll be over 2 gigs worth of data. When I make the memory
request using Boost.Interprocess, it says it has successfully reserved the
requested amount of memory. But when I use start to use it, I get a "Bus
error". From what I can tell, the bus error is a result of accessing memory
outside the range that is available for the memory segment.

So I started looking into how the shared memory is on Linux and what to
check to make sure my system is correctly configured to allow that large
amount of shared memory.

   1. I looked at the "files" at /proc/sys/kernel/shm*:
      - shmall - 4294967296 (4 Gb)
      - shmmax - 68719476736 (68 Gb)
      - shmmni - 4096

   2. I called the ipcs -lm command:

   ------ Shared Memory Limits --------
   max number of segments = 4096
   max seg size (kbytes) = 67108864
   max total shared memory (kbytes) = 17179869184
   min seg size (bytes) = 1

>From what I can tell, those settings indicate that I should be able to
allocate enough shared memory for my purposes. So I created a stripped down
program that created large amounts of data in shared memory:

#include <iostream>

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>

namespace bip = boost::interprocess;

typedef bip::managed_shared_memory::segment_manager segment_manager_t;
typedef bip::allocator<long, segment_manager_t> long_allocator;
typedef bip::vector<long, long_allocator> long_vector;

int main(int argc, char ** argv) {
    struct shm_remove {
        shm_remove() { bip::shared_memory_object::remove("ShmTest"); }
        ~shm_remove() { bip::shared_memory_object::remove("ShmTest"); }
    } remover;

    size_t szLength = 280000000;
    size_t szRequired = szLength * sizeof(long);
    size_t szRequested = (size_t) (szRequired * 1.05);
    bip::managed_shared_memory segment(bip::create_only, "ShmTest",
szRequested);

    std::cout <<
        "Length: " << szLength << "\n" <<
        "sizeof(long): " << sizeof(long) << "\n" <<
        "Required: " << szRequired << "\n" <<
        "Requested: " << szRequested << "\n" <<
        "Allocated: " << segment.get_size() << "\n" <<
        "Overhead: " << segment.get_size() -
segment.get_free_memory() << "\n" <<
        "Free: " << segment.get_free_memory() << "\n\n";

    long_allocator alloc(segment.get_segment_manager());
    long_vector vector(alloc);

    if (argc > 1) {
        std::cout << "Reserving Length of " << szLength << "\n";
        vector.reserve(szLength);
        std::cout << "Vector Capacity: " << vector.capacity() <<
"\tFree: " << segment.get_free_memory() << "\n\n";
    }

    for (size_t i = 0; i < szLength; i++) {
        if ((i % (szLength / 100)) == 0) {
            std::cout << i << ": " << "\tVector Capacity: " <<
vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n";
        }
        vector.push_back(i);
    }
    std::cout << "end: " << "\tVector Capacity: " << vector.capacity()
<< "\tFree: " << segment.get_free_memory() << "\n";

    return 0;
}

Hi,

I'm using CentOS 5.4 x86_64 and Boost 1.42.0 on a cluster that uses Open-MPI
1.3.3. I'm writing a shared library that uses shared memory to store large
amounts of data for multiple processes to use. There's also a loader
application that will read in the data from the files and load them into the
shared memory.

When I run the loader application, it determines the amount of memory that
it needs to store the data exactly then adds 25% for overhead. For just
about every file, it'll be over 2 gigs worth of data. When I make the memory
request using Boost's Interprocess library, it says it has successfully
reserved the requested amount of memory. But when I use start to use it, I
get a "Bus error". From what I can tell, the bus error is a result of
accessing memory outside the range that is available for the memory segment.

So I started looking into how the shared memory is on Linux and what to
check to make sure my system is correctly configured to allow that large
amount of shared memory.

   1. I looked at the "files" at /proc/sys/kernel/shm*:
      - shmall - 4294967296 (4 Gb)
      - shmmax - 68719476736 (68 Gb)
      - shmmni - 4096

   2. I called the ipcs -lm command:

   ------ Shared Memory Limits --------
   max number of segments = 4096
   max seg size (kbytes) = 67108864
   max total shared memory (kbytes) = 17179869184
   min seg size (bytes) = 1

>From what I can tell, those settings indicate that I should be able to
allocate enough shared memory for my purposes. So I created a stripped down
program that created large amounts of data in shared memory:

#include <iostream>

#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/containers/vector.hpp>

namespace bip = boost::interprocess;

typedef bip::managed_shared_memory::segment_manager segment_manager_t;
typedef bip::allocator<long, segment_manager_t> long_allocator;
typedef bip::vector<long, long_allocator> long_vector;

int main(int argc, char ** argv) {
    struct shm_remove {
        shm_remove() { bip::shared_memory_object::remove("ShmTest"); }
        ~shm_remove() { bip::shared_memory_object::remove("ShmTest"); }
    } remover;

    size_t szLength = 280000000;
    size_t szRequired = szLength * sizeof(long);
    size_t szRequested = (size_t) (szRequired * 1.05);
    bip::managed_shared_memory segment(bip::create_only, "ShmTest",
szRequested);

    std::cout <<
        "Length: " << szLength << "\n" <<
        "sizeof(long): " << sizeof(long) << "\n" <<
        "Required: " << szRequired << "\n" <<
        "Requested: " << szRequested << "\n" <<
        "Allocated: " << segment.get_size() << "\n" <<
        "Overhead: " << segment.get_size() -
segment.get_free_memory() << "\n" <<
        "Free: " << segment.get_free_memory() << "\n\n";

    long_allocator alloc(segment.get_segment_manager());
    long_vector vector(alloc);

    if (argc > 1) {
        std::cout << "Reserving Length of " << szLength << "\n";
        vector.reserve(szLength);
        std::cout << "Vector Capacity: " << vector.capacity() <<
"\tFree: " << segment.get_free_memory() << "\n\n";
    }

    for (size_t i = 0; i < szLength; i++) {
        if ((i % (szLength / 100)) == 0) {
            std::cout << i << ": " << "\tVector Capacity: " <<
vector.capacity() << "\tFree: " << segment.get_free_memory() << "\n";
        }
        vector.push_back(i);
    }
    std::cout << "end: " << "\tVector Capacity: " << vector.capacity()
<< "\tFree: " << segment.get_free_memory() << "\n";

    return 0;
}

Compiled it with the line:

g++ ShmTest.cpp -lboost_system -lrt

Then ran it with the following output (edited to make it smaller):

Length: 280000000
sizeof(long): 8
Required: 2240000000
Requested: 2352000000
Allocated: 2352000000
Overhead: 224
Free: 2351999776

0: Vector Capacity: 0 Free: 2351999776
2800000: Vector Capacity: 3343205 Free: 2325254128
5600000: Vector Capacity: 8558607 Free: 2283530912
8400000: Vector Capacity: 8558607 Free: 2283530912
11200000: Vector Capacity: 13693771 Free: 2242449600
14000000: Vector Capacity: 21910035 Free: 2176719488
...
19600000: Vector Capacity: 21910035 Free: 2176719488
22400000: Vector Capacity: 35056057 Free: 2071551312
...
33600000: Vector Capacity: 35056057 Free: 2071551312
36400000: Vector Capacity: 56089691 Free: 1903282240
...
56000000: Vector Capacity: 56089691 Free: 1903282240
58800000: Vector Capacity: 89743507 Free: 1634051712
...
89600000: Vector Capacity: 89743507 Free: 1634051712
92400000: Vector Capacity: 143589611 Free: 1203282880
...
142800000: Vector Capacity: 143589611 Free: 1203282880
145600000: Vector Capacity: 215384417 Free: 628924432
...
212800000: Vector Capacity: 215384417 Free: 628924432
215600000: Vector Capacity: 293999969 Free: 16
...
260400000: Vector Capacity: 293999969 Free: 16
Bus error

If you run the program with the a parameter (any will work, just need to
increase the argc), it preallocate the vector but will still result in a bus
error at the same array index.

I checked the size of the "files" at /dev/shm using the ls -ash /dev/shmcommand:

total 2.0G
   0 . 0 .. 2.0G ShmTest

And just like with my original application it the size of the allocated
shared memory is capped at 2 gigs. Given that it "successfully" allocated
2352000000 bytes of memory, in gigabytes (using 1024*1024*1024) it should be
2.19 Gb.

When I run my actual program to load data using MPI, I get this error
output:

Requested: 2808771120
Recieved: 2808771120

[c1-master:13894] *** Process received signal ***
[c1-master:13894] Signal: Bus error (7)
[c1-master:13894] Signal code: (2)
[c1-master:13894] Failing at address: 0x2b3190157000
[c1-master:13894] [ 0] /lib64/libpthread.so.0 [0x3a64e0e7c0]
[c1-master:13894] [ 1]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess26uninitialized_copy_or_moveINS0_10offset_ptrIlEEPlEET0_T_S6_S5_PNS_10disable_ifINS0_11move_detail16is_move_iteratorIS6_EEvE4typeE+0x218)
[0x2b310dcf3fb8]
[c1-master:13894] [ 2]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost9container6vectorIlNS_12interprocess9allocatorIlNS2_15segment_managerIcNS2_15rbtree_best_fitINS2_12mutex_familyENS2_10offset_ptrIvEELm0EEENS2_10iset_indexEEEEEE15priv_assign_auxINS7_IlEEEEvT_SG_St20forward_iterator_tag+0xa75)
[0x2b310dd0a335]
[c1-master:13894] [ 3]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost9container17containers_detail25advanced_insert_aux_proxyINS0_6vectorIlNS_12interprocess9allocatorIlNS4_15segment_managerIcNS4_15rbtree_best_fitINS4_12mutex_familyENS4_10offset_ptrIvEELm0EEENS4_10iset_indexEEEEEEENS0_17constant_iteratorISF_lEEPSF_E25uninitialized_copy_all_toESI_+0x1d7)
[0x2b310dd0b817]
[c1-master:13894] [ 4]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost9container6vectorINS1_IlNS_12interprocess9allocatorIlNS2_15segment_managerIcNS2_15rbtree_best_fitINS2_12mutex_familyENS2_10offset_ptrIvEELm0EEENS2_10iset_indexEEEEEEENS3_ISD_SB_EEE17priv_range_insertENS7_ISD_EEmRNS0_17containers_detail23advanced_insert_aux_intISD_PSD_EE+0x771)
[0x2b310dd0d521]
[c1-master:13894] [ 5]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess6detail8Ctor3ArgINS_9container6vectorINS4_IlNS0_9allocatorIlNS0_15segment_managerIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvEELm0EEENS0_10iset_indexEEEEEEENS5_ISF_SD_EEEELb0EiSF_NS5_IvSD_EEE11construct_nEPvmRm+0x157)
[0x2b310dd0d9a7]
[c1-master:13894] [ 6]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess15segment_managerIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvEELm0EEENS0_10iset_indexEE28priv_generic_named_constructIcEEPvmPKT_mbbRNS0_6detail18in_place_interfaceERNS7_INSE_12index_configISB_S6_EEEENSE_5bool_ILb1EEE+0x6fd)
[0x2b310dd0c85d]
[c1-master:13894] [ 7]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN5boost12interprocess15segment_managerIcNS0_15rbtree_best_fitINS0_12mutex_familyENS0_10offset_ptrIvEELm0EEENS0_10iset_indexEE22priv_generic_constructEPKcmbbRNS0_6detail18in_place_interfaceE+0xf8)
[0x2b310dd0dd58]
[c1-master:13894] [ 8]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN7POP_LTL16ExportPopulation22InitializeSharedMemoryEPKc+0x1609)
[0x2b310dceea99]
[c1-master:13894] [ 9]
../LookupPopulationLib/Release/libLookupPopulation.so(_ZN7POP_LTL10InitializeEPKc+0x349)
[0x2b310dd0ebb9]
[c1-master:13894] [10]
MPI_Release/LookupPopulation.MpiLoader(main+0x372) [0x4205d2]
[c1-master:13894] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3a6461d994]
[c1-master:13894] [12]
MPI_Release/LookupPopulation.MpiLoader(__gxx_personality_v0+0x239)
[0x420009]
[c1-master:13894] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 13894 on node c1-master
exited on signal 7 (Bus error).
--------------------------------------------------------------------------

I'm really not sure where to go with this. Does anyone have any suggestions
of what to try?
Thank you!
Matt Cupp



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net