[interprocess] Performance problem with managed_shared_memory

Hi there, I have a performance problem using the managed_shared_memory and the interprocess_vector. I attached a minimalistic, compilable example that demonstrates this. I create a vector that contains a simple class and I write into this vector. If the vector is located in the shared_memory this takes much longer than if it is located in the process-local memory. The main difference then is the used allocator. But I can not explain it. If I run the attached code, I get the following results: (running on Ubuntu 8.10, Boost version 1.40, gcc-Version 4.3.3) SHMEM_TESTING: Mean: 0.024768 seconds. else Mean: 0.015022 seconds. I do not understand where the difference results from. Is there anybody who has an explanation for that? #include <boost/interprocess/containers/vector.hpp> #include <boost/interprocess/allocators/allocator.hpp> #include <boost/interprocess/managed_shared_memory.hpp> #include <boost/thread/xtime.hpp> #include <iostream> #include <vector> namespace ipc = boost::interprocess; double get_timestamp() { boost::xtime timestamp; boost::xtime_get(×tamp, boost::TIME_UTC); return timestamp.sec + ((double)timestamp.nsec / 1000000000.0); } class Point3f { public: double x; double y; double z; }; #define VECTOR_ELEMENTS 500000 int main() { #define SHMEM_TESTING #ifdef SHMEM_TESTING ipc::shared_memory_object::remove("shmem"); ipc::managed_shared_memory managed_shm( ipc::create_only, "shmem" , VECTOR_ELEMENTS*3*sizeof( double ) + 1024 ); typedef ipc::managed_shared_memory::segment_manager segment_manager_t; typedef ipc::allocator<void, segment_manager_t> void_allocator; typedef ipc::allocator<Point3f, segment_manager_t> Point3fAllocator; typedef ipc::vector<Point3f, Point3fAllocator> Point3fVector; void_allocator alloc( managed_shm.get_segment_manager() ); Point3fVector * vec = managed_shm.construct<Point3fVector>( ipc::unique_instance )( alloc ); if ( !vec ) return -1; #else ipc::vector<Point3f> * vec = new ipc::vector<Point3f>(); // std::vector<Point3f> * vec = new std::vector<Point3f>(); #endif for ( unsigned int i = 0; i < VECTOR_ELEMENTS; ++i ) { vec->push_back( Point3f() ); } double sum = 0; unsigned int count = 0; for ( ; count < 20; ++count ) { double t1 = get_timestamp(); for ( unsigned int i = 0; i < vec->size(); ++i ) { ( *vec )[i].x = i; ( *vec )[i].y = i; ( *vec )[i].z = i; } sum += get_timestamp() - t1; } std::cerr << std::fixed << "Mean: " << sum/static_cast<double>(count) << " seconds." << std::endl; return 0; }

Moritz escribió:
Hi there,
I have a performance problem using the managed_shared_memory and the interprocess_vector. I attached a minimalistic, compilable example that demonstrates this. I create a vector that contains a simple class and I write into this vector. If the vector is located in the shared_memory this takes much longer than if it is located in the process-local memory. The main difference then is the used allocator. But I can not explain it.
Process shared containers use relative pointers so that each dereference needs additional operations to get the address on each process. Example: T &operator[](size_type idx) { *(start_+ idx;) } start_ is a smart pointer so pointer arithmetic is not trivial. The compiler can't also apply as many optimization as with raw pointers. So this is expected behaviour. You can improve it a bit with: for ( unsigned int i = 0; i < vec->size(); ++i ) { Point3f &f = ( *vec )[i]; f.x = i; f.y = i; f.z = i; } Or even faster obtaining a raw pointer to the first element: Point3f *first = &(*vec )[0]; for ( unsigned int i = 0; i < vec->size(); ++i ) { first[i].x = i; first[i].y = i; first[i].z = i; } Best, Ion

Hi Ion, thank you for your tremendous support in this mailing list. Best regards, Moritz Ion Gaztañaga wrote:
Process shared containers use relative pointers so that each dereference needs additional operations to get the address on each process. Example:
[...]
Best,
Ion
participants (2)
-
Ion Gaztañaga
-
Moritz