Boost logo

Boost :

From: Calum Grant (calum_at_[hidden])
Date: 2005-10-02 09:45:57


> > The one killer limitation of shmem (that I'm pretty sure Ion is
> > working hard to remove) is that the shared memory region cannot be
> > grown once it has been created. This is where your memory-mapped
> > "persist" library has a leg up.
>
> The problem is quite hard to solve if you allow shared memory to be
> placed in different base addresses in different processes. And
> performance would suffer if every pointer access I should
> check if the
> memory segment it points is already mapped. To identify each
> segment, a
> pointer should have the name of the segment and an offset.
> Each access
> would imply discovering the real address of such segment in
> the current
> process accessing the pointer. Really a hard task to do and
> performance
> would suffer a lot. I think previous efforts ("A C++ Pooled, Shared
> Memory Allocator For The Standard Template Library"
> http://allocator.sourceforge.net/) with growing shared memory
> use fixed
> memory mappings in different processes. But this is an issue I would
> like to solve after the first version of Shmem is presented
> to a review
> (I plan to do this shortly, within two months)
>
> Memory mapped files are another thing. Disk blocks can be
> dispersed in
> the disk but the OS will give you the illusion that all data is
> contiguous. Currently in Shmem, when using memory mapped
> files as memory
> backend, if your memory mapped file is full of data, you can grow the
> memory mapped file and remap it, so you have more data to work. An
> in-memory DB can be easily implemented using this technique: when the
> insertion in any object allocated in the memory mapped file throws
> boost::shmem::bad_alloc, you just call:
>
> named_mfile_object->grow(1000000/*additional bytes*/);
>
> and the file grows and you can continue allocating objects.

Couldn't the allocator do this instead of asking the user to do it? It
would be better if the container did not need special code for different
allocators.

> Take care
> because the OS might have changed the mapping address. In
> Shmem you can
> obtain offsets to objects to recover the new address of the remapped
> object. You can use the same technique with heap memory. The trick in
> Shmem is that to achieve maximum performance, the memory
> space must be
> contiguous. For growing memory, and persistent data, memory
> mapped files
> are available in Shmem. Maybe is not enough for a relational
> DB, but I
> would be happy to work with RTL library on this.
>
> I've downloaded RML and I've seen that "mt_tree" class uses
> raw pointers
> in the red-black tree algorithm. If you use memory mapped
> files and you
> store raw pointer there, this file is unusable if you don't
> map it again
> exactly in the same address where you created it. All data in
> the memory
> mapped file must be base-address independent. That's why Shmem uses
> offset_ptr-s and containers that accept this kind of pointers.
> So if we want to achieve persistence with RTL we must develop base
> independent containers. This is not a hard task but porting, for
> example, multiindex to offset_ptr-s, is not a one day issue.

If you can make the assumption that memory will not move, it would make
the implementation a lot simpler. There is a certain overhead in
offsetting pointers on each pointer dereference, and the red-black tree
algorithms are quite pointer intensive. Mt_tree could certainly use the
pointer type from the allocator, and I'll put that into my next release
of RML.

Persist's approach uses a pool of mapped memory - thereby avoiding
needing to move memory. [To people unfamiliar with mmap(): a file does
not have to be mapped contiguously into the address space]. Allocating
more memory means mapping another block, and no memory needs to be
moved. Although I haven't seen it in practice, it is certainly a
theoretical possibility that the OS will refuse to map the file back to
the same memory addresses the next time the program is run, and this is
the one reason why I haven't been pushing the Persist library because I
just can't guarantee its safety. My feeling is that if the address
space was large enough (i.e. 64-bit) and the OS could guarantee to map
to a specific address, then the offset_ptr workaround would become
unnecessary.

The other problem is that other threads won't be expecting objects to
move. This means that you can't have concurrent access to your
memory-mapped data. Also if the file is shared between processes and
you grow the file in one process, when does another process detect the
change?

My feeling is that safety is paramount, and that it is better to have a
safe slower implementation using offset_ptrs, than to use absolute
memory addresses and risk mmap() failure. Alternatively the application
could be robust to mmap() failure, for example if the memory-mapped data
could be reconstructed from another data source.

You could perhaps provide two allocators in Shmem: one that uses
offset_ptrs and another that does not.

Regards, Calum


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk