Boost logo

Boost Users :

From: Robert Ramey (ramey_at_[hidden])
Date: 2008-08-14 01:48:31


Daryle Walker wrote:
> I was thinking about adding serialization to some times I've been
> working on in the sandbox. First I tried to recall how Mr. Ramey
> said serialization can be tested. I couldn't find the specific post
> I was thinking about, but others that were found gave me the answer.
> Reading other posts in that search prompted me to ask more questions.
>
> I could reduce the classes I'm working with to:
>
> //=============================================
> class computer;
>
> class context
> {
> public:
> typedef boost::array<uint_least32_t, 4> value_type;
>
> context(); // use auto copy-ctr, copy-=, dtr
>
> void operator ()( bool ); // consumer
> bool operator ==() const; // equals
> bool operator !=() const; // not-equals
> value_type operator ()() const; // producer
>
> private:
> friend class computer;
>
> boost::uint_fast64_t length;
> boost::array<uint_fast32_t, 4> buffer;
> boost::array<bool, 512> queue;
>
> template < class Archive >
> void serialize( Archive &ar, const unsigned int version );
> };
>
> class computer
> : public convenience_methods_base<context>
> {
> // An object of type "context" is incorporated in this object
> // due to the base class. A mutable/const pair of non-static
> // member functions named "context()" gives access to the inner
> // context object.
>
> public:
> typedef context::value_type value_type;
>
> // Put various access member functions here that forward to the
> // internals of the "context" type, which work because of the
> // friend declaration.
>
> private:
> template < class Archive >
> void serialize( Archive &ar, const unsigned int version );
> };
> //=============================================
>
> I initially planned to have serialization functions for these two
> classes, the "convenience_methods_base" base class template, plus two
> other class templates (a base class and a support class) that
> "convenience_methods_base" uses. But the e-mail search I mentioned
> found a thread from May 2007 (on the main Boost list) the suggested
> that the serialization of a non-primitive should match the user's
> external representation of the type, and not the type's particular
> internal structure. So I decided to keep the serialization protocol
> just for the two public-facing classes, "context" and "computer."

I don't think I've ever said anything like this - at least not
intensionally.

The bedrock of the serialization is the composition of serialization
function call which reflects the underlying composition of the data
items into classes or types.

In a couple of very unusual cases, shared_ptr is the canonical
example, this composition is not possible. But I would emphasize
that these are unusual to the point of being pathological.

So in your case, I would just make each type serializable
in terms of its components.

Note that doing this will preserve the private nature of the
serialization functions. If they are made private and implemented
as member functions, they become internal implementation
details of the class. So you don't compilicate or pollute
your design by exposing implementation details in the
public interface.

>
> I figured that the "computer" object can be serialized like:
>
> //=============================================
> template < class Archive >
> inline void computer::serialize( Archive &ar, const unsigned int
> version )
> { ar & boost::serialization::make_nvp("context", this->context()); }

This is not how I would do it. I would

template <class context, class Archive >
inline void computer::serialize( Archive &ar, const unsigned int
version ){
    ar & BOOST_SERIALIZATION_BASE(convenience_methods_base<context>,
version)
}

which would in turn eventually call the serialization implemented in
convenience_methods_base<context>,

> //=============================================
>
> Which leaves how "context" objects are serialized. After thinking
> about it for hours, I decided to just whip out something quick &
> dirty and refine it later. So:
>
> //=============================================
> template < class Archive >
> inline void context::serialize( Archive &ar, const unsigned int
> version )
> {
> ar & BOOST_SERIALIZATION_NVP( length )
> & BOOST_SERIALIZATION_NVP( buffer )
> & BOOST_SERIALIZATION_NVP( queue );
> }

Which looks fine by me.

> //=============================================
>
> would give a final serialization, in my test file, of:
>
> //=============================================
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>1</length>
> <buffer class_id="2" tracking_level="0" version="0">
> <elems>
> <count>4</count>
> <item>1732584193</item>
> <item>4023233417</item>
> <item>2562383102</item>
> <item>271733878</item>
> </elems>
> </buffer>
> <queue class_id="3" tracking_level="0" version="0">
> <elems>
> <count>512</count>
> <item>1</item>
> <item>0</item>
> <!-- I'll spare you, and the mail server, of 509 more "<item>0</
> item>" lines -->
> <item>0</item>
> </elems>
> </queue>
> </context>
> </test>
> </boost_serialization>
> //=============================================
>
> Now I started refining, keeping the principle of not leaking
> implementation details in mind. The problem here is the array-
> counts, which I don't need since they'll never change. The first one
> I can fix by writing each element separately:

This is an artifact of our implementation of serialization of arrays in xml.
The count of elements is in fact redundant for a fixed size array. If
you wanted to eliminate it, the most transparent way would be to
define your own serialization for array and use that instead of the
one included in the library.

>
> //=============================================
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>1</length>
> <buffer-A>1732584193</buffer-A>
> <buffer-B>4023233417</buffer-B>
> <buffer-C>2562383102</buffer-C>
> <buffer-D>271733878</buffer-D>
> <message-tail class_id="2" tracking_level="0" version="0">
> <elems>
> <count>512</count>
> <item>1</item>
> <item>0</item>
> <!-- 509 more "<item>0</item>" lines -->
> <item>0</item>
> </elems>
> </message-tail>
> </context>
> </test>
> </boost_serialization>
> //=============================================
>
> I've always wanted to use something like a base-64 string encoding of
> the bit array, because it's cool and it'd save space. I added
> conversion functions to/from the bit array and a std::string, and
> then (de)serialized the string.

you might look in to "binary_object" which serializes its argument
as a base64 text for text and xml archives and binary in binary archives.

>I also had to separate "serialize"
> into "save" and "load" since conversion is complementary, not
> identical. So now I have:
>
> //=============================================
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>1</length>
> <buffer-A>1732584193</buffer-A>
> <buffer-B>4023233417</buffer-B>
> <buffer-C>2562383102</buffer-C>
> <buffer-D>271733878</buffer-D>
> <message-tail>g</message-tail>
> </context>
> </test>
> </boost_serialization>
> //=============================================
>
> Then I added tests for: exactly 6 bits (i.e. one base-64 letter); a
> sextet (actually two) and a partial sextet together; filling a queue
> to capacity (actually one short of that since a full queue
> automatically activates a turnover); and going past capacity
> resulting in a new hash buffer and an empty message-tail.
>
> //=============================================
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>1</length>
> <buffer-A>1732584193</buffer-A>
> <buffer-B>4023233417</buffer-B>
> <buffer-C>2562383102</buffer-C>
> <buffer-D>271733878</buffer-D>
> <message-tail>g</message-tail>
> </context>
> </test>
> </boost_serialization>
>
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>6</length>
> <buffer-A>1732584193</buffer-A>
> <buffer-B>4023233417</buffer-B>
> <buffer-C>2562383102</buffer-C>
> <buffer-D>271733878</buffer-D>
> <message-tail>q</message-tail>
> </context>
> </test>
> </boost_serialization>
>
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>14</length>
> <buffer-A>1732584193</buffer-A>
> <buffer-B>4023233417</buffer-B>
> <buffer-C>2562383102</buffer-C>
> <buffer-D>271733878</buffer-D>
> <message-tail>qQg</message-tail>
> </context>
> </test>
> </boost_serialization>
>
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>511</length>
> <buffer-A>1732584193</buffer-A>
> <buffer-B>4023233417</buffer-B>
> <buffer-C>2562383102</buffer-C>
> <buffer-D>271733878</buffer-D>
> <message-
> tail>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-
> _AAAAAAAAAAH__________g</message-tail>
> </context>
> </test>
> </boost_serialization>
>
>
> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
> <!DOCTYPE boost_serialization>
> <boost_serialization signature="serialization::archive" version="5">
> <test class_id="0" tracking_level="0" version="0">
> <context class_id="1" tracking_level="0" version="0">
> <length>512</length>
> <buffer-A>2631642121</buffer-A>
> <buffer-B>80961853</buffer-B>
> <buffer-C>4033330630</buffer-C>
> <buffer-D>497373075</buffer-D>
> <message-tail></message-tail>
> </context>
> </test>
> </boost_serialization>
> //=============================================
>
> If you want to see the actual work, look at revision/change-set
> #48131 in Boost's Subversion set-up. Now to the actual questions:
>
> 1. If there's only one sub-object, base or member, that has any
> significant data, could someone call the "serialize" member function
> of that sub-object directly in the wrapping class's "serialize"?
> (This assumes that friendship is set up.) This would make the
> wrapping class look identical to the sub-object's class, right? Is
> this a good idea?

serialize functions are not intended to be called directly. That's why
one is encouraged to make them private. Some machinery is
implemented in the archive class, and calling the serialization
function directly would bypass this.

> 2. Before actually trying to serialize a string, I was worried that
> the string's serialization would include a length count. This would
> be unnecessary because the object's "length" attribute already
> implies the length of the string (int( ceil( double( length % 512 ) /
> 6.0 ) )). Here, we see that the string's length isn't explicitly
> included in the XML archive, so I have no worries. But what about
> non-XML archives? Will be string's length be directly serialized,
> wasting space? If so, how can I fix that?

in xml no length
in text - includes a length count - but arrays don't
in binary - no length count

> 3. Having to add std::string to support serialization makes my class
> header heavier. My class uses fixed-sized arrays, so is there any way
> that I can avoid allocating a string?

just use an array of characters - that is a fixed lenth no overhead.

> For writing out, could I set
> up a char-array with the encoding and write that out?

in the library, encoding - utf-8, locale, etc is determined by the
stream attached to the archive.

> For reading
> in, can I read the string in piecemeal to a char-array just in case
> someone added more characters than required.

There is no enforced requirement that the saving and loading have
to be the "same". I would recommend always not dividing serialize
in to save/load unless its necessary. If it is, then I strive to make
them symetrical so that their correctness can be easily verified.

> My converter currently
> ignores illegal characters and stops when enough legal characters
> have been read. If what I ask is possible, would the reading routine
> have to seek to the end of the entry so further serialization isn't
> messed up?

For this you would have to add your own special sauce to the
xml archive class. This you could do by derivation. It's straightforward
but ends up being somewhat tricky in practice.

Robert Ramey


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net