Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2002-11-17 01:22:07


>From: Matthias Troyer <troyer_at_[hidden]>

>1.) The first problem are the basic data types used in the archive:

>short, int and long have no defined bit size, and can thus never be used
>for portable serialization.

>Imagine I use a platform where long is
>64-bit, write it to the archive and then read it again on a platform
>where long is 32-bit. This will cause major problems.

Suppose you have a number on the first platform that exceeds 32
significant bits. What happens when the number is loaded onto
the second platform. Are the high order bits truncated? How
do you address this problem now? If none of your longs
are larger than 32 significant bits then there is not problem.
If some are, the 32 machine can't represent them.

This can't cause any problems you don't have already.

>It also prevents the use of archive format that rely on fixed bit sizes (such as XDR or
>any other platform independent binary format). My suggestion thus is to
>change the types in these functions to int8_t, int16_t, int32_t, as was
>already done for int64_t. That way portable implementations will be
>possible.

I believe that you could just typedef the above on both platforms and use a text archive
and every thing would just fine. The text archive represents all numbers
as arbitrary length integers which would be converted correctly on
save as well as load.

>2.) The second problem is speed when serializing large containers of
>basic data types, e.g. a vector<double> or ublas vectors and matrices.
>In my applications these can easily by hundreds of megabyte in size. In
>the current implementation, serializing a std::vector<double>(10000000)
>requires ten million virtual function calls. In order to prevent this,
>I propose to add extra virtual functions (like the operator<< above),
>which serialize C-arrays of basic data tyes, i.e. functions like

Serialization version 6 which was submitted for review includes
serialization of C-arrays. It is documented in the reference
under the title "Serialization Implementations included in the Library"
and a test case was added to test.cpp.

>In conjunction with this, the serialization for std::vector and for
>ublas vectors, etc. has to be adapted to make use of these optimized
>serialization functions for basic data types.

The library permits override of the included implementations.
Of course, this has to be up to the person who finds the
the included implementation inconvenient in some way as he is
the only one who knows what he wants changed.

>the serialization of very large numbers of small objects. The current
>library shows a way to optimize this (in reference.html#large), but it
>is rather cumbersome. As it is now, I have to reimplement the
>serialization of std::vector<T>, or std::list<T>, etc., for all such
>types T. In almost all of my codes I have a large number of small
>objects of various types for which I know that I will never serialize a
>pointer. I would thus propose the following:

>i) add a traits class to specify whether ever a pointer to an object
>will be serialized or if it should be treated as a small object for
>which serialization should be optimized

>ii) specialize the serialization of the standard library containers for
>these small objects, using the mechanism in the documentation.

>That way I just need to specify a trait for my object and it will be
>serialized efficiently

I would be loath to implement this idea. Basically, instead of overloading
the serializations that you want to speed up, you want to require
all of us to specify traites for every class we want to serialize. It would
make things harder to use. Also, the current implementation - like much
boost code - stretches current compilers to the breaking point. Its
already much more complex to implement than I expected and
I already have much difficulty accomdating all the differences
in C++ implementations.

>4.) I am confused about registering polymorphic types. If one program
>reads an archive written by another program, do both have to register
>all the types in exactly the same order, or is it OK if the program
>reading the archive registers only a subset of types and in another
>order? I need that when an evaluation program reads only the first part
>of a file (e.g. only the base class), without reading the rest of the
>serialized data of the derived class. Can I read the base class from an
>archive into which I serialized the derived class?
>This is important for programs which just act on the information in the
>base class.

This is not addressed well in the documentation and the discussion
on this thread has made it seem much more complex than it really is.

The problem only occurs when serializing polymorphic pointers.
When a pointer tag is read, the load has to know what kind of
object to create. Its more subtle than it would first appear, hence
the spirited discussion. However, I believe we have agreed on
a general approach that everyone is satisfied with. Of course
once we get to specifics the howl will start again. stay tuned.

>5.) This is a point for discussion an no criticism about the library.
>Instead of polluting the global namespace with a serialization class, I
>would prefer to implement serialization with free functions save and
>load instead.

Wouldn't that just pollute the global namespace with a large
number of save/load functions instead?

>6.) Finally, if I am correctly informed, the Java language includes
>serialization and has a portable archive format. Could this library be
>made compatible with this Java language standard, i.e. might it be
>possible to create an archive format which can read such Java
>serialization files?

This is not likely, Java has runtime reflexion which is used to
obtain all the information required for serialization. Also,
Java has a much more limited usage of pointers so certain
problems we are dealing with don't come up. I don't believe
that all the data structures can be unambiguously mapped
to java.

>
> What is your evaluation of the implementation?

>I would like to see a platform-independent binary archive format (e.g.
>using XDR), but am also willing to contribute that myself once the
>interface has been finalized.

Thank you. Note that none of the comments made so far have any
impact on the interfaces defined by the base classes basic_[i|o]archive,
So there is no reason you can't get started now. As you can see
from the 3 derivations included in the package, making your own
XDRarchive is a pretty simple proposition if you have the xdr<->float
code. In this case take a copies of biarchive and boarchive and
reimplement the functions to read/write XDR instead of binary
data and you're done.

>
> What is your evaluation of the documentation?

As was already remarked by others, I would like to see documentation on
exactly which functions a new archive type has to implement.

Wouldn't be easier just to look at the basic_[i|o]archive code? Perhaps
we might want to break out text_archive an native binary archive
into separate headers. That might make it more obvious that
these derivations aren't really part of the library but rather more
like popular examples.

>I tried to use the library but could not compile it under MacOS X 10.2
>with gcc 3.1
>Compiling the file "demo.cpp" gives me the error:

>./../boost/serialization/serialization_imp.hpp:382: sorry, not
>implemented: `
> tree_list' not supported by dump_expr

Hmm in my copy that corresponds to a statement:
        BOOST_STATIC_ASSERT(false);
You can just comment this out for now

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk