Boost logo

Boost :

From: Paul Mclachlan (pdm_at_[hidden])
Date: 2001-07-26 14:58:35


At 17:46 07 Jul 2001 -0000, michel.andre_at_[hidden] wrote:

> Would there be any interest in an persistent vector with the exact
> same interface as std::vector that stores it's data in a persistent
> file? Preferably implemented using memory mapped files on plattforms
> were they are supported.
>
> Of course there are restrictions on the data contained in such a
> vector it may not contain any dynamically allocated data or pointers.

I have something very similar that I was considering writing up a description
of. Now seems as good a time as any.

A quick summary: Memory mapped persistable data structures, including
arrays, lists and pointers (via a pointer-wrapping class).

The work essentially considers an objects binary format as part of it's
contract with the world. I specifically do not handle stuff you should use
an OODB for, particularly versioning, transactions, etc.

My design requirements (for what I have) do not require vtbl's, so I haven't
considered them, except to note that the C++ Users Journal has published
some quite simple techniques for refreshing an object's vtbl when the file
is loaded, mainly just invoking the placement new operator on the address
it's at. Again, I don't have a way of identifying where the starts of all
objects are in my implementation (maybe someone can think of one?), so this
might be quite difficult to do generally.

The major limitation is that there is no corresponding free() operation when
allocating into the persistable region, which can be slightly inconvenient
when generating the data structure (but not more inconvenient than
marshalling it to disk manually, a common alternative).

So, I'm left with an object like:

        class Metric
        {
                uint32_t number_calls;
                uint32_t total_timing;
        };

where the fact that the object -is- 64bits is part of it's definition.

endian-ness:
------------

I handle endian-ness issues (at present) with a wrapper type (either BE or
LE) that stores into the appropriate binary format and converts from it on
every read. So you would instead write:

        class Metric
        {
                BE< uint32_t > number_calls;
                BE< uint32_t > total_timing;
        }

This is fine if you're willing to set your binary format to be one or the
other (big or little), or leave it undefined (and not use the wrapper at
all). It's also quite easy to use use copy constructors to convert from one
endian-ness to another (when you don't specifically code BE or LE but code
"endian_wrapper" or somesuch that you can instantiate one for each).

The wrapper is properly compiled away to direct access to the underlying
type on platforms that use that type natively. So, for my company, our main
audience is Windows & Linux x86, so we can use LE (little endian) for our
config files and either pay a performance penalty on a BE chip or write copy
constructor converters when we move the file between platforms.

pointers:
---------

I handle pointers by, instead of storing a pointer, storing an offset from
the current location. Traditionally one might store a pointer in a
persisted file as an offset from the start of the file, this is inconvenient
for the 'persistable_pointer' wrapper type, that takes 'this' and adds the
offset in order to generate a "real" pointer. Obviously the
persistable_pointer class must have copy/assignment constructors that
intelligently update the offset based on the new object's address. But
because they're offsets - bulk memcpy's of the entire region (or mapping it
into an arbitrary location) will work - so long as you don't move pieces of
the region around. Internally, persistable_pointer can use the BE or LE
(big-endian or little-endian) wrapper to have a platform independent offset.

So, for an example, you could have:

        class Configuration
        {
                persistable_pointer< Metric > first;
        }

        class Metric
        {
                BE< uint32_t > number_calls;
                BE< uint32_t > total_timing;
                persistable_pointer< Metric > next;
                persistable_pointer< Metric > prev;
        };

I have a simple slist class that can encapsulate that behaviour rather than
hand coding the links as above. But you can see how it would work, your
file format might be:

1: Configuration = ( &Metric1 )
2: Metric1 = ( 1, 5, &Metric2, NULL )
3: Metric2 = ( 2, 2, NULL, &Metric1 )

        Where, on line 1, the & is stored as +1. On line 2, next& is stored
        as +2 also (just enough to skip the two pointers). On line 3, prev&
        is stored as -7 (is my math right?) to point to the start of Metric1.

I don't actually have a vector class at present, but do have an array class
where you tell it it's length and give it a pointer to the first in an array
of objects.

A string implementation is similar - it is essentially a
persistable_pointer< char > with some helper routines for length, etc.

Reading:
--------

Reading the file back in is as simple as mmap() and casting the pointer to
the start to an object of the appropriate time. This is a speed advantage
when compared to marshalling (serializing?) to disk.

Improvement:
------------

In any case, there are lots of options for improvement:

 - is it possible to rely on an STL implementation for a binary contract?
   Could we simply write an allocator and let it use the persistable region
   with appropriate templated pointers? How could we possibly handle
   delete's in this situation?

 - A little memory management system could be added to the persistable
   region concept that would allow you to free() and then re-allocate
   finding the appropriate sized block. At the moment allocation is a
   pointer increment though, which is kinda nice.
 
 - I only have data classes: 'string', 'array', 'slist' and 'flagset' (very
   close to a bitset in intent). A persistable hashtable would be very
   nice.

This little utility is nice for my project's configuration and data files
(which are write-once read many), but it's just a couple of ideas cobbled
together - maybe other people have done this or something similar (or even
more advanced?) and we can build a general mappable persistance framework?

Oh, I have it on MSVC++ and gcc on linux & solaris.

- Paul

-- 
Paul Mclachlan (paul.mclachlan_at_[hidden])
Software Engineer, Java Tools.  NuMega Lab, Compuware corp.

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk