Boost logo

Boost :

From: Daryle Walker (darylew_at_[hidden])
Date: 2008-08-13 17:28:56


I was thinking about adding serialization to some times I've been
working on in the sandbox. First I tried to recall how Mr. Ramey
said serialization can be tested. I couldn't find the specific post
I was thinking about, but others that were found gave me the answer.
Reading other posts in that search prompted me to ask more questions.

I could reduce the classes I'm working with to:

//=============================================
class computer;

class context
{
public:
     typedef boost::array<uint_least32_t, 4> value_type;

     context(); // use auto copy-ctr, copy-=, dtr

     void operator ()( bool ); // consumer
     bool operator ==() const; // equals
     bool operator !=() const; // not-equals
     value_type operator ()() const; // producer

private:
     friend class computer;

     boost::uint_fast64_t length;
     boost::array<uint_fast32_t, 4> buffer;
     boost::array<bool, 512> queue;

     template < class Archive >
     void serialize( Archive &ar, const unsigned int version );
};

class computer
     : public convenience_methods_base<context>
{
     // An object of type "context" is incorporated in this object
     // due to the base class. A mutable/const pair of non-static
     // member functions named "context()" gives access to the inner
     // context object.

public:
     typedef context::value_type value_type;

     // Put various access member functions here that forward to the
     // internals of the "context" type, which work because of the
     // friend declaration.

private:
     template < class Archive >
     void serialize( Archive &ar, const unsigned int version );
};
//=============================================

I initially planned to have serialization functions for these two
classes, the "convenience_methods_base" base class template, plus two
other class templates (a base class and a support class) that
"convenience_methods_base" uses. But the e-mail search I mentioned
found a thread from May 2007 (on the main Boost list) the suggested
that the serialization of a non-primitive should match the user's
external representation of the type, and not the type's particular
internal structure. So I decided to keep the serialization protocol
just for the two public-facing classes, "context" and "computer."

I figured that the "computer" object can be serialized like:

//=============================================
template < class Archive >
inline void computer::serialize( Archive &ar, const unsigned int
version )
{ ar & boost::serialization::make_nvp("context", this->context()); }
//=============================================

Which leaves how "context" objects are serialized. After thinking
about it for hours, I decided to just whip out something quick &
dirty and refine it later. So:

//=============================================
template < class Archive >
inline void context::serialize( Archive &ar, const unsigned int
version )
{
     ar & BOOST_SERIALIZATION_NVP( length )
        & BOOST_SERIALIZATION_NVP( buffer )
        & BOOST_SERIALIZATION_NVP( queue );
}
//=============================================

would give a final serialization, in my test file, of:

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>1</length>
                <buffer class_id="2" tracking_level="0" version="0">
                        <elems>
                                <count>4</count>
                                <item>1732584193</item>
                                <item>4023233417</item>
                                <item>2562383102</item>
                                <item>271733878</item>
                        </elems>
                </buffer>
                <queue class_id="3" tracking_level="0" version="0">
                        <elems>
                                <count>512</count>
                                <item>1</item>
                                <item>0</item>
<!-- I'll spare you, and the mail server, of 509 more "<item>0</
item>" lines -->
                                <item>0</item>
                        </elems>
                </queue>
        </context>
</test>
</boost_serialization>
//=============================================

Now I started refining, keeping the principle of not leaking
implementation details in mind. The problem here is the array-
counts, which I don't need since they'll never change. The first one
I can fix by writing each element separately:

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>1</length>
                <buffer-A>1732584193</buffer-A>
                <buffer-B>4023233417</buffer-B>
                <buffer-C>2562383102</buffer-C>
                <buffer-D>271733878</buffer-D>
                <message-tail class_id="2" tracking_level="0" version="0">
                        <elems>
                                <count>512</count>
                                <item>1</item>
                                <item>0</item>
<!-- 509 more "<item>0</item>" lines -->
                                <item>0</item>
                        </elems>
                </message-tail>
        </context>
</test>
</boost_serialization>
//=============================================

I've always wanted to use something like a base-64 string encoding of
the bit array, because it's cool and it'd save space. I added
conversion functions to/from the bit array and a std::string, and
then (de)serialized the string. I also had to separate "serialize"
into "save" and "load" since conversion is complementary, not
identical. So now I have:

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>1</length>
                <buffer-A>1732584193</buffer-A>
                <buffer-B>4023233417</buffer-B>
                <buffer-C>2562383102</buffer-C>
                <buffer-D>271733878</buffer-D>
                <message-tail>g</message-tail>
        </context>
</test>
</boost_serialization>
//=============================================

Then I added tests for: exactly 6 bits (i.e. one base-64 letter); a
sextet (actually two) and a partial sextet together; filling a queue
to capacity (actually one short of that since a full queue
automatically activates a turnover); and going past capacity
resulting in a new hash buffer and an empty message-tail.

//=============================================
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>1</length>
                <buffer-A>1732584193</buffer-A>
                <buffer-B>4023233417</buffer-B>
                <buffer-C>2562383102</buffer-C>
                <buffer-D>271733878</buffer-D>
                <message-tail>g</message-tail>
        </context>
</test>
</boost_serialization>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>6</length>
                <buffer-A>1732584193</buffer-A>
                <buffer-B>4023233417</buffer-B>
                <buffer-C>2562383102</buffer-C>
                <buffer-D>271733878</buffer-D>
                <message-tail>q</message-tail>
        </context>
</test>
</boost_serialization>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>14</length>
                <buffer-A>1732584193</buffer-A>
                <buffer-B>4023233417</buffer-B>
                <buffer-C>2562383102</buffer-C>
                <buffer-D>271733878</buffer-D>
                <message-tail>qQg</message-tail>
        </context>
</test>
</boost_serialization>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>511</length>
                <buffer-A>1732584193</buffer-A>
                <buffer-B>4023233417</buffer-B>
                <buffer-C>2562383102</buffer-C>
                <buffer-D>271733878</buffer-D>
                <message-
tail>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-
_AAAAAAAAAAH__________g</message-tail>
        </context>
</test>
</boost_serialization>

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE boost_serialization>
<boost_serialization signature="serialization::archive" version="5">
<test class_id="0" tracking_level="0" version="0">
        <context class_id="1" tracking_level="0" version="0">
                <length>512</length>
                <buffer-A>2631642121</buffer-A>
                <buffer-B>80961853</buffer-B>
                <buffer-C>4033330630</buffer-C>
                <buffer-D>497373075</buffer-D>
                <message-tail></message-tail>
        </context>
</test>
</boost_serialization>
//=============================================

If you want to see the actual work, look at revision/change-set
#48131 in Boost's Subversion set-up. Now to the actual questions:

1. If there's only one sub-object, base or member, that has any
significant data, could someone call the "serialize" member function
of that sub-object directly in the wrapping class's "serialize"?
(This assumes that friendship is set up.) This would make the
wrapping class look identical to the sub-object's class, right? Is
this a good idea?

2. Before actually trying to serialize a string, I was worried that
the string's serialization would include a length count. This would
be unnecessary because the object's "length" attribute already
implies the length of the string (int( ceil( double( length % 512 ) /
6.0 ) )). Here, we see that the string's length isn't explicitly
included in the XML archive, so I have no worries. But what about
non-XML archives? Will be string's length be directly serialized,
wasting space? If so, how can I fix that?

3. Having to add std::string to support serialization makes my class
header heavier. My class uses fixed-sized arrays, so is there any way
that I can avoid allocating a string? For writing out, could I set
up a char-array with the encoding and write that out? For reading
in, can I read the string in piecemeal to a char-array just in case
someone added more characters than required. My converter currently
ignores illegal characters and stops when enough legal characters
have been read. If what I ask is possible, would the reading routine
have to seek to the end of the entry so further serialization isn't
messed up?

-- 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT hotmail DOT com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk