Boost :

Date view	Thread view	Subject view	Author view

From: Robert Ramey (ramey_at_[hidden])
Date: 2002-11-25 20:39:08

Next message: Robert Ramey: "Re: [boost] Serialization library review"
Previous message: Robert Ramey: "Re: [boost] Serialization Library Review"
Next in thread: Jeremy Maitin-Shepard: "Re: [boost] Serialiization Review repost with consistent quoting"
Reply: Jeremy Maitin-Shepard: "Re: [boost] Serialiization Review repost with consistent quoting"
Reply: Dirk Gerrits: "[boost] int vs int32_t [was: Serialiization Review repost with consistent quoting]"

Date: Sun, 24 Nov 2002 16:28:50 +0100
From: Matthias Troyer <troyer_at_[hidden]>

>The shared_ptr demo does not compile:

>boost/shared_ptr.hpp:297: `boost::detail::shared_count
>boost::shared_ptr<A>::pn' is private
>demo_shared_ptr.cpp:183: within this context

>Why do you want to access a private variable here?

Serialization of a shared_ptr needs to alter this variable.
This wasn't contemplated whe shared_ptr was implemented

>code fragments such as:
>line 95-96 of archive.cpp seem unacceptable to me:

> // note breaking a rule here - is this a problem on some platform
> is.read(const_cast<char *>(s.data()), size);

Although is non standard I believe that the above code will work on all known platforms.
It yields an improvement significant improvement in efficiency ( I believe) without
any known detrimental effects. Perhaps I should leave the provably portable
code as a comment.

We should discuss whether to use short, int, long ... as the primitive
types or int8_t, int16_t, int32_t, int64_t. The latter makes it easier to
write portable archives, the former seems more natural. I can accept both
choices but we should not mix the two as is done now

>the archive interface addresses all fundamental types. These are
>char, unsigned int, int, etc.... and int64_t and uint64_t . Other types
>are pseudonyms generated by typedefs in header files so there should
>be no reason to include them.

> * A serialization of bool is missing - easy to fix

I don't understand what you mean. basic_[i|o]archive contain:

        // write out booleans as 1 and 0
        virtual basic_oarchive & operator<<(bool _Val)
        {
                *this << static_cast<char>(_Val ? 1 : 0);
                return *this;
        }
and
        // read in booleans as 1 or 0
        virtual basic_iarchive & operator>>(bool& _Val)
        {
                char i;
                *this >> i;
                _Val = i;
                return *this;
        }

>* The code will not compile on platforms where long is 64-bit:
>
> virtual basic_oarchive & operator<<(long _Val) = 0;
> virtual basic_oarchive & operator<<(int64_t _Val) = 0;

>4b. Interface design: performance: need improved methods for large data
>sets
>========================================================================

>As mentioned in previous posts, additional functions e.g. load_array
>and save_array need to be added to allow efficient serialization of large data sets.
>The default version could just use operator<< or operator>> as in:

>virtual void save_array(const int* p, std::size_t n) {
> for (std::size_t i=0;i<n;++i)
> *this << p[i];
>}

>and this would thus not incur any extra coding work for people not interested.
>Serialization of containers such as std::vector, or ublas or mtl vectors
>and matrices can make use of this extra function transparent to the user so that
>the interface would also not become harder to understand for the
>library user.

why can't this be handled using

basic_oarchive::write_binary(void *p, size_t count)

? This has been in the library from the begining. How is this different than
what you want to do?

4c.i) Problems with the current version
---------------------------------------

>Currently the library wants to protect me from problems with binary
>incompatibilities by checking if the sizes of the primitive types are
>the same on the platform on which I read or wrote. I believe this to be
>a misfeature. Consider two platforms

> A: 32-bit long, 64-bit long long, int64_t typedef-ed as long long
> B: 64-bit long, int64_t typedef-ed as long

>Currently when I try to read an archive from platform A on platform B,
>the library aborts because of incompatible primitive types. However, as a power
>use, knowing about portability problems I did not use long or long long in my codes, but
>always int64_t, and should thus have no problems (ignoring byte order issues). The
>library should thus allow me to read the file although the primitive types are
>different!

>Considering this problem, which we face daily (we transfer files between
>Linux PCs, Macintosh, SUN and Alpha workstations, Cray, HP, Hitachi and
>Fujitsu supercomputers) has made us choose int*_t as the primitive types for
>serialization in our library (which is in use since 1994). See 4.a.i).

the native binary archives included in the package have absolutely no pretentions
to portability. Never have, never will. I you believe you can implement
a portable binary format, feel free. The library permits and encourages this. I believe
that the next version will permit override of the init so that if you believe
that the native binary archive created will infact be portable between the platforms you
use, you will beable to derirve from the native binary and override
the consistency checking code. ( at your own risk of course)

>4c.ii) Change of the binary archive class design
>------------------------------------------------

>I propose a change to the design of the native binary archives. While
>implementing operator<< to just call write_binary is common to all native binary
>archive classes, the specific implementation of write_binary can be different. I thus
>propose to factor out the operator<< implementations into a new base class
>basic_obarchive, which still contains write_binary as pure virtual function. From this
>class we can derive:

>stream_obarchive: serializing into a stream as done now
>file_obarchive: serialization into a file using fwrite
>buffer_obarchive: serialization into a memory buffer using e.g. memcpy
>..
>and similar for the input archives.

Think of the native binary implementation as an example of the usage of the library.

Your more elaborate definition of a family of binary archives is totally in keeping
with the manner that the library is intended to be used. I would call these
definitions examples of how to use the library rather than part of the library
itself. So I would be disinclined to make native binary archives any more
elaborate than they are now.

I do believe that the particular types of archives (iarchive, biarchive, etc) will
be separated into thier own header files. Partly to make compilation faster
and partly to highlight the point made in the paragraph above.

>4d. Interface design: small objects
>===================================

>I have mentioned this in a previous post. Instead of requiring the user
>to reimplement the serialization of standard containers for all small
>object types for which the versioning and pointer system should be bypassed, a
>traits class can be added and the optimized serialization of all containers of small
>objects implemented in the library. Note that the traits class needs to be
>specialized only for those objects for which the user wants to optimize
>serialization, while no effort is required at all if the standard serialization method is to
>be used.

I don't see why this would be necessary - I will have to investigate

>The current library is however not consistent since

>* serialization of normal classes goes via specialization of the
>serialization<T>
> class

>* serialization of template classes goes via overloading of the free
>function
> serialization_detail::save_template(), ...

>This is unacceptable and a consistent method should be found.

I believe that what you refering to is an artifact of a workaround
for compilers that fail to support partial template specialization.
This will probably addressed for comforming compilers but
others will have to live with this or something like it.

>5.a) Factor out the pointer serialization features
>--------------------------------------------------

>It should be straightforward to factor out the serialization of pointers
>and to split the library into archives for the serialization of basic
>types, and an add-on for the serialization of pointers. That way users who do
>not need the elaborate pointer serialization mechanism can do so and not
>incur the performance penalty (in terms of memory and speed) of to this
>feature, while in cases when it is needed we can add it on to he archive.

This feature incurrs no performance penalty if it is not used. That is,
code for serializaing pointers is not generated nor included if there
are no pointers in the classes to be serialized.

There is no real reason for separating pointers from the libary just
as there is no real reason separating other data types that required
a some special handling (enums, C arrays).

I believe that the serialization will be split between input and output
parts much as the standard i/o library is today. This will result
in nicer size modules.

and no, I can't say for sure that it would be straigt forward.

>5.b) Allow overriding of preamble, etc.
---------------------------------------

>I would like to have more control over some aspects that are currently
>hardcoded into the library:

>* writing/reading the preamble

I believe the the preamble will be overridable

>* obtaining the version number of a class
>* starting/finishing writing an object
>* a new type is encountered

hmmm - new type is encounterd? I don't know what that means.

>The motivation is very simple: We have hundreds of gigabytes of data
>lying around
>in tens of thousands of files that could easily be read by the
>serialization archive
>if there were not too small differences:

>i) I wrote a different preamble
>ii) I only wrote one version number for all classes in the archive
>instead of separate
> version numbers for each class
>iii) no information was written when a new class was encountered

>Since otherwise the interface is nearly identical (many classes contain a load
and a save function, albeit with a different name for the archive classes), changing
>all my codes over to a boost::serialization library would be easy if it
>weren't for the three issues above.

I believe you are wrong here. The interfaces might seem similar but there
is no reason to believe that the file formats have very much in common.
I don't believe there is any way enough flexiblity could be added to
deserialize a file serialized by another system.

Note: converting legacy files to a new serializaion system is very easy:

load the file into memory using the old system

save the data into a new file using the new system.

forget about the old system.

>Since these are major changes I would like to see a new review after
>they are implemented and thus vote NO for now. However I am willing to help
>Robert with implementing the changes, improving the library, and am willing to
>discuss further.

I very much appreciate your interest in making an portable implementation of
an XDR binary archive binary archive and understand you have made great progress in this
in a very short time. Please let me know if there is anything you need
else you need from me. Many users feel that this is necessary and it
would demonstrate the ease of use of the library.

I know you have spend a lot of time studying and working with the library
and I much appreciate your efforts.

Robert Ramey

Next message: Robert Ramey: "Re: [boost] Serialization library review"
Previous message: Robert Ramey: "Re: [boost] Serialization Library Review"
Next in thread: Jeremy Maitin-Shepard: "Re: [boost] Serialiization Review repost with consistent quoting"
Reply: Jeremy Maitin-Shepard: "Re: [boost] Serialiization Review repost with consistent quoting"
Reply: Dirk Gerrits: "[boost] int vs int32_t [was: Serialiization Review repost with consistent quoting]"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk