Boost logo

Boost :

From: Tom Becker (voidampersand_at_[hidden])
Date: 2002-01-23 15:05:19

On Wed, 23 Jan 2002 10:24:51 +0100, Matthias Troyer
<troyer_at_[hidden]> wrote:
>I want to bring up these issues here:
>i) I often have to (de)serialize large arrays of numbers, for which
>an optimized function should exist that can (de)serialize a C-array
>in one function call. This also allows support for data formats such
>as HDF

That's a good feature. It can make a huge difference in performance.

>ii) (De)serialization of pointers

The simple case is to serialize the pointed-to data. The interesting
case is references to shared data, where it would be inefficient and
possibly harmful to deserialize the same object more than once. A
typical implementation has a data structure associating pointers and
object IDs. If a pointer doesn't have an ID, you assign it one, and
serialize both the ID and the pointed-to data. After that, you only
have to serialize the ID. One nice trick is that references to known
constant data can be assigned known IDs, so their data never has to
be serialized. Deserialization simply reverses the process. The
details of what the IDs look like, and how the IDs are stored
relative to the pointed-to data, will depend on the framework and
file format(s) the application needs to work with. Pointer
serialization should be a separate mechanism that is layered on top
of basic serialization.

>iii) using runtime polymorphism with the persistence library. At the
>moment only compile time polymorphism is implemented, and the
>Reader/Writer needs to be chosen at compile time. This is a problem
>in my applications, where the (de)serialization is controlled from
>an application framework, which calls a virtual save/load function
>of a simulation object. For this to work the save and load functions
>for the basic data types need to be virtual functions too.

The persistence library needs to serialize some type information
along with the data. When deserializing, the caller is expecting a
pointer to a particular type. It's okay if it actually gets a pointer
to a derived type. The persistence library just has to read the type
information and allocate the actual type, whatever it is.

I like the approach where there is a reader function and a writer
function registered for each record format in the persistent data.
This way an object can support multiple record formats as necessary.
All the other approaches, such as using reader and writer objects, or
calling virtual save/load functions, can be used from the reader and
writer functions. It's by far the most flexible approach. The
downside is the functions have to be registered. I think it's easiest
to do that by hand, but there are ways it can be done automatically
and the choice can be left up to the framework or application

>Any ideas/comments how to proceed with the persistence library,
>which seems to me a very important one?

I'd like to see a persistence library that can replace the
persistence code in all the application frameworks that are out
there. At the least, it should have a design that allows writing
adapters so it can be data format compatible with other persistence

A good place to start would be understanding the inputs and outputs
of the most commonly used existing persistence mechanisms. I'm fairly
familiar with most of the persistence approaches that are used or
have been used on the Mac. If there are others who are interested in
doing a general solution, let's talk.



Tom Becker                      "Within C++, there is a much smaller and
<voidampersand_at_[hidden]>        cleaner language struggling to get out."
                                                       -- Bjarne Stroustrup

Boost list run by bdawes at, gregod at, cpdaniel at, john at