Boost :

Date view	Thread view	Subject view	Author view

From: Robert Ramey (ramey_at_[hidden])
Date: 2002-11-25 00:44:31

Next message: Rene Rivera: "RE: [boost] Boost License Issues"
Previous message: Jeremy Maitin-Shepard: "Re: [boost] Serialization Library Review"
Maybe in reply to: Matthias Troyer: "[boost] Serialization library review"
Next in thread: David Abrahams: "Re: [boost] Serialization library review"
Reply: David Abrahams: "Re: [boost] Serialization library review"
Reply: Matthias Troyer: "Re: [boost] Serialization library review"

Date: Sun, 24 Nov 2002 16:28:50 +0100
From: Matthias Troyer <troyer_at_[hidden]>

3. Does not work on all platforms - solvable problem
4. Interface design: there are some show-stoppers here for now
    a) primitive types: code is not portable at the moment
    b) performance: need improved methods for large data sets
    c) binary archives: design problems in current version
    d) small objects: optimization could easily be achieved by a traits
class
       instead of reimplementing all the std-library serializaing for all
       small object classes individually
    e) serialization template specialization versus free functions?
5. Overall design: I want a better factorization of the library into
    a basic library and separate support for pointers. Also, one needs to
    be able to override archive preamble, and functions called for
    i) new class encountered
    ii) start/end of an object serialization
    reading the version number, ...

The shared_ptr demo does not compile:

boost/shared_ptr.hpp:297: `boost::detail::shared_count
boost::shared_ptr<A>::pn' is private
demo_shared_ptr.cpp:183: within this context

Why do you want to access a private variable here?

Otherwise I have not encountered any problems

>code fragments such as:
>line 95-96 of archive.cpp seem unacceptable to me:

> // note breaking a rule here - is this a problem on some platform
> is.read(const_cast<char *>(s.data()), size);

Although is non standard I believe that the above code will work on all known platforms.
It yields an improvement significant improvement in efficiency ( I believe) without
any known detrimental effects. Perhaps I should leave the provably portable
code as a comment.

3. Does not work on all platforms
=================================

Running the test program on MacOS X 10.2 (using gcc 3.1) gives:

*** testing native binary archive
./../boost/serialization/archive.hpp:986: failed assertion `is.good()'
Abort

It thus seems there are problems with the binary streams. Note that
under
Darwin (FreeBSD in general?), the binary specification is just ignored
as
can be read on the man page for fopen:

     The mode string can also include the letter ``b'' either as a third
char-
      acter or as a character between the characters in any of the
two-charac-
      ter strings described above. This is strictly for compatibility
with
      ISO/IEC 9899:1990 (``ISO C89'') and has no effect; the ``b'' is
ignored.

These problems need to be sorted out, but I do not consider them urgent
since a redesign of the binary archives is urgently needed beforehand.

>We should discuss whether to use short, int, long ... as the primitive
>types or int8_t, int16_t, int32_t, int64_t. The latter makes it easier to
>write portable archives, the former seems more natural. I can accept both
>choices but we should not mix the two as is done now

the archive interface addresses all fundamental types. These are
char, unsigned int, int, etc.... and int64_t and uint64_t . Other types
are pseudonyms generated by typedefs in header files so there should
be no reason to include them.

> * A serialization of bool is missing - easy to fix

I don't understand what you mean. basic_[i|o]archive contain:

        // write out booleans as 1 and 0
        virtual basic_oarchive & operator<<(bool _Val)
        {
                *this << static_cast<char>(_Val ? 1 : 0);
                return *this;
        }
and
        // read in booleans as 1 or 0
        virtual basic_iarchive & operator>>(bool& _Val)
        {
                char i;
                *this >> i;
                _Val = i;
                return *this;
        }

>* The code will not compile on platforms where long is 64-bit:
>
> virtual basic_oarchive & operator<<(long _Val) = 0;
> virtual basic_oarchive & operator<<(int64_t _Val) = 0;

the current code contains the following.

        #ifndef BOOST_NO_INT64_T
        virtual basic_oarchive & operator<<(int64_t _Val) = 0;
        virtual basic_oarchive & operator<<(uint64_t _Val) = 0;
        #endif

I guess this should be changed to:
        #ifdef BOOST_HAS_MS_INT64
        virtual basic_iarchive & operator>>(int64_t & _Val) = 0;
        virtual basic_iarchive & operator>>(uint64_t & _Val) = 0;
        #endif
        #ifdef BOOST_HAS_LONG_LONG
        virtual basic_iarchive & operator>>(long long & _Val) = 0;
        #endif

4b. Interface design: performance: need improved methods for large data
sets
========================================================================
====

>As mentioned in previous posts, additional functions e.g. load_array
>and save_array need to be added to allow efficient serialization of large data sets.
>The default version could just use operator<< or operator>> as in:

>virtual void save_array(const int* p, std::size_t n) {
> for (std::size_t i=0;i<n;++i)
> *this << p[i];
>}

>and this would thus not incur any extra coding work for people not interested.
>Serialization of containers such as std::vector, or ublas or mtl vectors
>and matrices can make use of this extra function transparent to the user so that
>the interface would also not become harder to understand for the
>library user.

why can't this be handled using

basic_oarchive::write_binary(void *p, size_t count)

? This has been in the library from the begining. How is this different than
what you want to do?

4c. Interface design: binary archives
=====================================

I see two problems that can easily be fixed:

4c.i) Problems with the current version
---------------------------------------

>Currently the library wants to protect me from problems with binary
>incompatibilities by checking if the sizes of the primitive types are
>the same on the platform on which I read or wrote. I believe this to be
>a misfeature. Consider two platforms

> A: 32-bit long, 64-bit long long, int64_t typedef-ed as long long
> B: 64-bit long, int64_t typedef-ed as long

>Currently when I try to read an archive from platform A on platform B,
>the library aborts because of incompatible primitive types. However, as a power
>use, knowing about portability problems I did not use long or long long in my codes, but
>always int64_t, and should thus have no problems (ignoring byte order issues). The
>library should thus allow me to read the file although the primitive types are
>different!

>Considering this problem, which we face daily (we transfer files between
>Linux PCs, Macintosh, SUN and Alpha workstations, Cray, HP, Hitachi and
>Fujitsu supercomputers) has made us choose int*_t as the primitive types for
>serialization in our library (which is in use since 1994). See 4.a.i).

the native binary archives included in the package have absolutely no pretentions
to portability. Never have, never will. I you believe you can implement
a portable binary format, feel free. The library permits and encourages this. I believe
that the next version will permit override of the init so that if you believe
that the native binary archive created will infact be portable between the platforms you
use, you will beable to derirve from the native binary and override
the consistency checking code. ( at your own risk of course)

4c.ii) Change of the binary archive class design
------------------------------------------------

>I propose a change to the design of the native binary archives. While
>implementing operator<< to just call write_binary is common to all native binary
>archive classes, the specific implementation of write_binary can be different. I thus
>propose to factor out the operator<< implementations into a new base class
>basic_obarchive, which still contains write_binary as pure virtual function. From this
>class we can derive:

>stream_obarchive: serializing into a stream as done now
>file_obarchive: serialization into a file using fwrite
>buffer_obarchive: serialization into a memory buffer using e.g. memcpy
>..
>and similar for the input archives.

Think of the native binary implementation as an example of the usage of the library.

Your more elaborate definition of a family of binary archives is totally in keeping
with the manner that the library is intended to be used. I would call these
definitions examples of how to use the library rather than part of the library
itself. So I would be disinclined to make native binary archives any more
elaborate than they are now.

I do believe that the particular types of archives (iarchive, biarchive, etc) will
be separated into thier own header files. Partly to make compilation faster
and partly to highlight the point made in the paragraph above.

>4d. Interface design: small objects
===================================

>I have mentioned this in a previous post. Instead of requiring the user
>to reimplement the serialization of standard containers for all small
>object types for which the versioning and pointer system should be bypassed, a
>traits class can be added and the optimized serialization of all containers of small
>objects implemented in the library. Note that the traits class needs to be
>specialized only for those objects for which the user wants to optimize
>serialization, while no effort is required at all if the standard serialization method is to
>be used.

I don't see why this would be necessary - I will have to investigate

>The current library is however not consistent since

>* serialization of normal classes goes via specialization of the
>serialization<T>
> class

>* serialization of template classes goes via overloading of the free
>function
> serialization_detail::save_template(), ...

>This is unacceptable and a consistent method should be found.

I believe that what you refering to is an artifact of a workaround
for compilers that fail to support partial template specialization.
This will probably addressed for comforming compilers but
others will have to live with this or something like it.

>5.a) Factor out the pointer serialization features
>--------------------------------------------------

>It should be straightforward to factor out the serialization of pointers
>and to split the library into archives for the serialization of basic
>types, and an add-on for the serialization of pointers. That way users who do
>not need the elaborate pointer serialization mechanism can do so and not
>incur the performance penalty (in terms of memory and speed) of to this
>feature, while in cases when it is needed we can add it on to he archive.

This feature incurrs no performance penalty if it is not used. That is,
code for serializaing pointers is not generated nor included if there
are no pointers in the classes to be serialized.

There is no real reason for separating pointers from the libary just
as there is no real reason separating other data types that required
a some special handling (enums, C arrays).

I believe that the serialization will be split between input and output
parts much as the standard i/o library is today. This will result
in nicer size modules.

and no, I can't say for sure that it would be strait forward.

5.b) Allow overriding of preamble, etc.
---------------------------------------

>I would like to have more control over some aspects that are currently
>hardcoded into the library:

>* writing/reading the preamble

I believe the the preamble will be overridable

>* obtaining the version number of a class
>* starting/finishing writing an object
>* a new type is encountered

hmmm - new type is encounterd? I don't know what that means.

>The motivation is very simple: We have hundreds of gigabytes of data
>lying around
>in tens of thousands of files that could easily be read by the
>serialization archive
>if there were not too small differences:

>i) I wrote a different preamble
>ii) I only wrote one version number for all classes in the archive
>instead of separate
> version numbers for each class
>iii) no information was written when a new class was encountered

>Since otherwise the interface is nearly identical (many classes contain a load
and a save function, albeit with a different name for the archive classes), changing
>all my codes over to a boost::serialization library would be easy if it
>weren't for the three issues above.

I believe you are wrong here. The interfaces might seem similar but there
is no reason to believe that the file formats have very much in common.
I don't believe there is any way enough flexiblity could be added to
deserialize a file serialized by another system.

Note: converting legacy files to a new serializaion system is very easy:

load the file into memory using the old system

save the data into a new file using the new system.

forget about the old system.

>Since these are major changes I would like to see a new review after
>they are implemented and thus vote NO for now. However I am willing to help
>Robert with implementing the changes, improving the library, and am willing to
>discuss further.

I very much appreciate your interest in making an portable implementation of
an XDR binary archive binary archive and understand you have made great progress in this
in a very short time. Please let me know if there is anything you need
else you need from me. Many users feel that this is necessary and it
would demonstrate the ease of use of the library.

I know you have spend a lot of time studying and working with the library
and I much appreciate your efforts.

Robert Ramey

Next message: Rene Rivera: "RE: [boost] Boost License Issues"
Previous message: Jeremy Maitin-Shepard: "Re: [boost] Serialization Library Review"
Maybe in reply to: Matthias Troyer: "[boost] Serialization library review"
Next in thread: David Abrahams: "Re: [boost] Serialization library review"
Reply: David Abrahams: "Re: [boost] Serialization library review"
Reply: Matthias Troyer: "Re: [boost] Serialization library review"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk