Boost

Boost/Serialization ‐ Portable Binary Archives (PBA)

Tutorial

NOTE: Naming differs from the source code! This tutorial reflects our hope to get the eos portable archives an official part of the boost serialization library. Work is in progress :-)


Quick start
Format
Examples


Quick start

Are you impatient to enjoy the Boost Portable Binary Archives (PBA) ? If so, this section is for you.

How to store some simple data in a portable binary output archive

The tutorial_pba_0.cpp sample program uses a boost::archive::portable_binary_oarchive object attached to a standard output file stream to store a couple of variables of primitive types (bool, char, integer numbers, floating numbers) and even a std::string.

 The tutorial_pba_0.cpp source code Download   Show/hide
/** tutorial_pba_0.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This quick start example shows how to store some variables
 * of basic types (bool, integer, floating point numbers, STL string) 
 * using the portable binary archive format associated to a 
 * standard output file stream.
 *
 */

#include <string>
#include <fstream>

#include <boost/cstdint.hpp>
#include <boost/archive/portable_binary_oarchive.hpp>

int main (void)
{
  // The name for the example data file :  
  std::string filename = "pba_0.data"; 

  // Some variables of various primitive types :
  bool        b              = true;
  char        c              = 'B';
  uint32_t    answer         = 42;
  float       computing_time = 7.5e6;
  double      e              = 2.71828182845905;
  std::string slogan         = "DON'T PANIC";
  
  // Open an output file stream in binary mode :
  std::ofstream fout (filename.c_str (), std::ios_base::binary);
  
  {
    // Create an output portable binary archive attached to the output file :
    boost::archive::portable_binary_oarchive opba (fout);
    
    // Store (serializing) variables :
    opba & b & c & answer & computing_time & e & slogan;
  }

  return 0;   
}

// end of tutorial_pba_0.cpp


The compiled executable creates the pba_0.data file which contains the following bytes:

127   1   9   1  84   1  66   1  42   4 192 225 228  74   8 116
 87  20 139  10 191   5  64   1  11  68  79  78  39  84  32  80
 65  78  73  67
This format is explained in details below.

Note:

To top

How to load some simple data from a portable binary input archive

The tutorial_pba_1.cpp sample program uses a boost::archive::portable_binary_iarchive object attached to a standard input file stream in order to load the data previously stored by the tutorial_pba_0.cpp program in the pba_0.data file.

 The tutorial_pba_1.cpp source code Download   Show/hide
/** tutorial_pba_1.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization package. 
 *
 * This quick start example shows how to load some variables
 * of basic types (bool, integer, floating point numbers, STL string) 
 * using the portable binary archive format associated to a 
 * standard input file stream. 
 *
 */

#include <iostream>
#include <string>
#include <fstream>

#include <boost/cstdint.hpp>
#include <boost/archive/portable_binary_iarchive.hpp>

int main (void)
{
  using namespace std;

  // The name for the example data file :  
  string filename = "pba_0.data"; 

  // Some variables of various types :
  bool     b;
  char     c;
  uint32_t answer;
  float    computing_time;
  double   e;
  string   slogan;
   
  // Open an input file stream in binary mode :
  ifstream fin (filename.c_str (), ios_base::binary);
  
  {
    // Create an input portable binary archive attached to the input file :
    boost::archive::portable_binary_iarchive ipba (fin);
    
    // Loading (de-serializing) variables using the same 
    // order than for serialization (see tutorial_pba_0.cpp) :
    ipba & b & c & answer & computing_time & e & slogan;
  }

  cout.precision (15);
  cout << "Variable 'b' is              : " << b << " " << "(bool)" << endl;
  cout << "Variable 'c' is              : '" << c << "' " << " " << "(char)" << endl;
  cout << "Variable 'answer' is         : " << answer  << " " << "(unsigned 32-bit integer)" << endl;
  cout << "Variable 'computing_time' is : " << computing_time  << " " << "(single precision 32-bit float)" << endl;
  cout << "Variable 'e' is              : " << e  << " " << "(double precision 64-bit float)" << endl;
  cout << "Variable 'slogan' is         : \"" << slogan  << "\" " << "(std::string)" << endl;

  return 0;   
}

// end of tutorial_pba_1.cpp


The executable reads the pba_0.data file and deserializes its contents in the same order it has been stored. It then prints the restored values of the variables:

Variable 'b' is              : 1 (bool)
Variable 'c' is              : 'B'  (char)
Variable 'answer' is         : 42 (unsigned 32-bit integer)
Variable 'computing_time' is : 7500000 (single precision 32-bit float)
Variable 'e' is              : 2.71828182845905 (double precision 64-bit float)
Variable 'slogan' is         : "DON'T PANIC" (std::string)

To top

Format

This section aims to give some details about the binary format of portable binary archives (PBA). We will analyse the byte contents of the sample binary archive pba_0.data file created by the tutorial_pba_0.cpp program (see the previous section).

Like any other archive format within Boost/Serialization, a PBA starts with a header (this is the default behaviour but it is possible to deactivate the use of this header using a special flag at construction, see this example). This header is made of two informations :

Now we are done with the header, let's have a look on the serialized data !

Now the contents of the pba_0.data file can be fully understood :

127   1   9   1  84   1  66   1  42   4 192 225 228  74   8 116
 87  20 139  10 191   5  64   1  11  68  79  78  39  84  32  80
 65  78  73  67
More details about the format (non finite floating point values, negative integer numbers) will be given in the sample codes below.

To top

Examples

Handling special floating point values

The PBA has been designed in the aims to handle single and double precision floating point numbers, including non-finite and special values:

The tutorial_pba_2.cpp sample program illustrates the use of such special cases while serializing single precision floating point numbers:

 The tutorial_pba_2.cpp source code Download   Show/hide
/** tutorial_pba_2.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This sample program shows how to use a portable binary archive 
 * to store/load floating point numbers including non-finite and 
 * special (denormalized) values.
 *
 */

#include <string>
#include <fstream>
#include <limits>

#include <boost/archive/portable_binary_oarchive.hpp>
#include <boost/archive/portable_binary_iarchive.hpp>

int main (void)
{
  using namespace std;

  // The name for the example data file :  
  string filename = "pba_2.data"; 

  {
    // A normal single precision floating point number :
    float pi = 3.14159265; 

    // Single precision zeroed floating point number :
    float zero = 0.0;

    // A denormalized single precision floating point number :
    float tiny = 1.e-40;
    
    // A single precision floating point number with `+Infinity' value :
    float plus_infinity = numeric_limits<float>::infinity ();
    
    // A single precision floating point number with `-Infinity' value :
    float minus_infinity = -numeric_limits<float>::infinity ();
    
    // A single precision `Not-a-Number' (NaN):
    float nan = numeric_limits<float>::quiet_NaN ();
    
    // Open an output file stream in binary mode :
    ofstream fout (filename.c_str (), ios_base::binary);
    
    {
      // Create an output portable binary archive attached to the output file :
      boost::archive::portable_binary_oarchive opba (fout);
      
      // Store (serialize) variables :
      opba & pi & zero & tiny & plus_infinity & minus_infinity & nan;
    }
  }

  { 
    // Single precision floating point numbers to be loaded :
    float x[6];

    // Open an input file stream in binary mode :
    ifstream fin (filename.c_str (), ios_base::binary);
  
    {
      // Create an input portable binary archive attached to the input file :
      boost::archive::portable_binary_iarchive ipba (fin);
      
      // Load (de-serialize) variables using the same 
      // order than for serialization :
      for (int i = 0; i < 6; ++i)
	{
	  ipba & x[i];
	}
    }

    // Print :
    for (int i = 0; i < 6; ++i)
      {
	cout.precision (8);
	cout << "Loaded x[" << i << "] = " << x[i];
	switch (fp::fpclassify(x[i]))
	  {
	  case FP_NAN: cout << " (NaN)"; break;
	  case FP_INFINITE: cout << " (infinite)"; break;
	  case FP_SUBNORMAL: cout << " (denormalized)"; break;
	  case FP_NORMAL:  cout << " (normalized)"; break;
	  }
	cout << endl;
      }
  }

  return 0;   
}

// end of tutorial_pba_2.cpp


The pba_2.data output data file thus contains the following bytes:

127   1   9   4 219  15  73  64   0   3 194  22   1   4   0   0
128 127   4   0   0 128 255   4 255 255 255 127
where:

To top

Forbidding the serialization of non finite float values

One can ask a PBA to reject non-finite values. This is done by passing the boost::archive::no_infnan flag to the constructor of the output archive. Note that in this case, denormalized values are still accepted, but infinite and NaNs aren't.

The tutorial_pba_3.cpp sample program that illustrates this special case:

 The tutorial_pba_3.cpp source code Download   Show/hide
/** tutorial_pba_3.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This sample program shows how to use a portable binary archive 
 * and prevent the serialization of non-finite floating numbers.
 *
 */

#include <string>
#include <fstream>
#include <limits>

#include <boost/archive/portable_binary_oarchive.hpp>

int main (void)
{
  using namespace std;

  // The name for the example data file :  
  string filename = "pba_3.data"; 

  try 
    {
      // An array of single precision floating numbers:
      float x[5]; 
      x[0] = 3.14159;  // Pi
      x[1] = 6.022e22; // Avogadro constant
      x[2] = 1.6e-19;  // Electron charge magnitude
      x[3] = 1.e-40;   // A tiny (denormalized) value
      x[4] = numeric_limits<float>::infinity (); // This will fail while serializing...

      // Open an output file stream in binary mode :
      ofstream fout (filename.c_str (), ios_base::binary);
    
      {
	// Create an output portable binary archive attached to the output file,
	// using the special 'boost::archive::no_infnan' flag :
	boost::archive::portable_binary_oarchive opba (fout, boost::archive::no_infnan);
	
	// Store (serialize) variables :
	for (int i = 0; i < 5; ++i)
	  {
	    clog << "Serializing value : " << x[i] << " ... ";
	    opba & x[i];
	    clog << "Ok !" << endl;
	  }
      }
    }
  catch (exception & x)
    {
      cerr << "ERROR: " << x.what () << endl;
      return 1;
    }

  return 0;   
}

// end of tutorial_pba_3.cpp


We can check that the PBA now throws an exception as soon as it encounters a non finite floating point value during the serialization process:

Serializing value : 3.14159 ... Ok !
Serializing value : 6.022e+22 ... Ok !
Serializing value : 1.6e-19 ... Ok !
Serializing value : 9.99995e-41 ... Ok !
Serializing value : inf ... ERROR: serialization of illegal floating point value: inf

To top

Serializing integer numbers

The PBA obviously handles integer numbers. Unfortunately, C/C++ does not garantee the portable size of its primitive integer types (short, int, long... and their unsigned versions). It depends on the architecture (32-bit/64-bit) and the compiler.

The Boost library addresses this issue through a collection of typedefs for integer types of common sizes. This technique is supposed to allow the manipulation of integer variables in a portable way, typically with text or XML archives. So, we are generally encouraged to use the boost/cstdint.hpp header file and the typedefs defined therein.

Due to its encoding scheme of integer numbers, the PBA does not strictly need such technique to ensure a correct behaviour while (de)serializing integer numbers. This is because the little endian encoding approach allows to only store the non-zero bytes. It is thus possible to serialize a value using one integer type (short int) and then deserialize it using another integer type (long long).

However, for a strict and safe portable behaviour of PBA, we recommend that, in most cases, the user should systematically use such typedefs for all serializable integer values. This applies particularly for member attributes in structs and classes and should allows the transparent switching to another kind of archive (text, XML) thanks to the serialize template method.

The tutorial_pba_4.cpp sample program illustrates the serialization/deserialization of 8-bit, 16-bit, 32-bit and 64-bit integer numbers:

 The tutorial_pba_4.cpp source code Download   Show/hide
/** tutorial_pba_4.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This sample program shows how to use a portable binary archive 
 * to store/load integer numbers of various sizes using the Boost 
 * portable integer typedefs.
 *
 */

#include <string>
#include <fstream>

#include <boost/cstdint.hpp>
#include <boost/archive/portable_binary_oarchive.hpp>
#include <boost/archive/portable_binary_iarchive.hpp>

int main (void)
{
  using namespace std;

  // The name for the example data file :  
  string filename = "pba_4.data"; 

  {
    // Some integer numbers :
    bool t = true;
    char c = 'c';
    unsigned char u = 'u';
    int8_t   b = -3; // char
    uint8_t  B = +6; // unsigned char 
    int16_t  s = -16;
    uint16_t S = +32;
    int32_t  l = -128;
    uint32_t L = +127;
    int64_t  ll = -1024;
    uint64_t LL = +2048;

    // Open an output file stream in binary mode :
    ofstream fout (filename.c_str (), ios_base::binary);
    
    {
      // Create an output portable binary archive attached to the output file :
      boost::archive::portable_binary_oarchive opba (fout);
      
      // Store (serialize) variables :
      opba & t & c & u & b & B & s & S & l & L & ll & LL;
    }
  }

  { 
    // Single precision floating numbers to be loaded :
    // Some integer numbers :
    bool t;
    char c;
    unsigned char u;
    int8_t   b;
    uint8_t  B;
    int16_t  s;
    uint16_t S;
    int32_t  l;
    uint32_t L;
    int64_t  ll;
    uint64_t LL;

    // Open an input file stream in binary mode :
    ifstream fin (filename.c_str (), ios_base::binary);
  
    {
      // Create an input portable binary archive attached to the input file :
      boost::archive::portable_binary_iarchive ipba (fin);
      
      // Load (de-serialize) variables using the same 
      // order than for serialization :
      ipba & t & c & u & b & B & s & S & l & L & ll & LL;
    }

    clog << "t  = " << t << " (bool)" << endl;
    clog << "c  = '" << c << "' (char)" << endl;
    clog << "u  = '" << u << "' (unsigned char)" << endl;
    clog << "b  = " << (int) b << " (int8_t)" << endl;
    clog << "B  = " << (int) B << " (uint8_t)" << endl;
    clog << "s  = " << s << " (int16_t)" << endl;
    clog << "S  = " << S << " (uint16_t)" << endl;
    clog << "l  = " << l << " (int32_t)" << endl;
    clog << "L  = " << L << " (uint32_t)" << endl;
    clog << "ll = " << ll << " (int64_t)" << endl;
    clog << "LL = " << LL << " (uint64_t)" << endl;
  }

  return 0;   
}

// end of tutorial_pba_4.cpp


The resulting PBA file is:

127   1   9   1  84   1  99   1 117 255 253   1   6 255 240   1
 32 255 128   1 127 254   0 252   2   0   8
where:

Note that this coding scheme optimizes the number of streamed bytes. Particularly, it discards the leading zero-ed bytes (MSB) of the binary encoding of any integer value in order to save storage. Also we recall that the exact 0 value (zero or false for a boolean data) is always encoded using a unique 0 byte (zero optimization). Note this approach is also used for floating point numbers.

To top

Using PBA serialization with a memory buffer

In some case, we don't want to serialize some data in a file (std::ofstream), but we simply plan to stream it in a memory buffer.

The tutorial_pba_5.cpp sample program makes use of a memory buffer implemented with a STL vector of characters. The PBA is associated to this buffer thanks to a special streaming interface mechanism provided by the Boost/Iostreams library. With such technique one can stream serializable data in some memory buffer in place of a file :

 The tutorial_pba_5.cpp source code Download   Show/hide
/** tutorial_pba_5.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This sample program shows how to use a portable binary archive 
 * to store/load data in a memory buffer.
 *
 */

#include <string>
#include <vector>

#include <boost/iostreams/stream.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/device/array.hpp>

#include <boost/cstdint.hpp>
#include <boost/archive/portable_binary_oarchive.hpp>
#include <boost/archive/portable_binary_iarchive.hpp>

int main (void)
{
  using namespace std;

  // The memory buffer is implemented using a STL vector :
  typedef std::vector<char> buffer_type;
  buffer_type buffer; 

  {
    // Some data to be stored :
    bool    t = true;
    char    c = 'c';
    int16_t s = +16;
    int32_t l = -128;
    int64_t ll = +10000000000;
    float   pi = 3.14159;
    double  nan = numeric_limits<double>::quiet_NaN ();
    string  hello = "World !";

    buffer.reserve (1024); // pre-allocate some memory

    // The output stream interface to the buffer :
    boost::iostreams::stream<boost::iostreams::back_insert_device<buffer_type> > output_stream (buffer);

    {    
      // Create an output portable binary archive attached to the output file :
      boost::archive::portable_binary_oarchive opba (output_stream);
      
      // Store (serialize) variables :
      opba & t & c & s & l & ll & pi & nan & hello;
    }

  }

  clog << "Buffer content is " << buffer.size () << " bytes : " << endl << "  ";
  for (int i = 0; i < buffer.size (); ++i)
    {
      clog << (int) ((unsigned char) buffer[i]) << ' '; 
      if ((i + 1) % 20 == 0) clog << endl << "  ";
    }
  clog << endl;

  { 
    // Some data to be loaded :
    bool    t;
    char    c;
    int16_t s;
    int32_t l;
    int64_t ll;
    float   pi;
    double  nan;
    string  hello;

    // The input stream interface to the buffer :
    boost::iostreams::stream<boost::iostreams::array_source> input_stream (&buffer[0], 
									   buffer.size ());

    {
      // Create an input portable binary archive attached to the input file :
      boost::archive::portable_binary_iarchive ipba (input_stream);
      
      // Load (de-serialize) variables :
      ipba & t & c & s & l & ll & pi & nan & hello;
    }

    clog << "Loaded values from the buffer are: " << endl;
    clog << "  t  = " << t << " (bool)" << endl;
    clog << "  c  = '" << c << "' (char)" << endl;
    clog << "  s  = " << s << " (int16_t)" << endl;
    clog << "  l  = " << l << " (int32_t)" << endl;
    clog << "  ll = " << ll << " (int64_t)" << endl;
    clog << "  pi = " << pi << " (float)" << endl;
    clog << "  nan = " << nan << " (double)" << endl;
    clog << "  hello = \"" << hello << "\" (std::string)" << endl;
  }

  return 0;   
}

// end of tutorial_pba_5.cpp


After the storing of data in the archive, the content of the buffer of characters is printed:

Buffer content is 40 bytes : 
  127 1 9 1 84 1 99 1 16 255 128 5 0 228 11 84 2 4 208 15 
  73 64 8 255 255 255 255 255 255 255 127 1 7 87 111 114 108 100 32 33 
  
Loaded values from the buffer are: 
  t  = 1 (bool)
  c  = 'c' (char)
  s  = 16 (int16_t)
  l  = -128 (int32_t)
  ll = 10000000000 (int64_t)
  pi = 3.14159 (float)
  nan = nan (double)
  hello = "World !" (std::string)

Again the PBA encoding scheme can be easily interpreted. This is let as an exercise.

Extra:

You may have a look on the tutorial_pba_6.cpp program that shows a possible — and provocative — combined usage of the Boost/Serialization concepts, the Boost/Iostreams facilities and the PBA; it enables the copy of an object of a non-copyable class.

 The tutorial_pba_6.cpp source code Download   Show/hide
/** tutorial_pba_6.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of
 * the Boost/Serialization library.
 *
 * This sample program shows how to use a portable binary archive
 * associated to a memory buffer to copy a non-copyable object.
 *
 */

#include <iostream>
#include <string>
#include <sstream>
#include <vector>

#include <boost/utility.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/device/array.hpp>

#include <boost/cstdint.hpp>
#include <boost/archive/portable_binary_oarchive.hpp>
#include <boost/archive/portable_binary_iarchive.hpp>

using namespace std;

/* A foo noncopyable class */
struct foo : boost::noncopyable
{
  uint32_t status;
  double   value;
  double   special;

  string to_string () const
  {
    ostringstream sout;
    sout << "foo={status=" << status << "; value=" << value  << "; special=" << special<< "}";
    return sout.str();
  }

  template<class Archive>
  void serialize (Archive & ar, const unsigned int version)
  {
    ar & status;
    ar & value;
    ar & special;
    return;
  }

};

// A templatized copy function for Boost/Serialization equipped classes.
// Here we use PBAs associated to a memory buffer :
template <class Serializable>
void copy (const Serializable & source, Serializable & target)
{
  namespace io = boost::iostreams;
  namespace ba = boost::archive;
  if (&source == &target) return; // self-copy guard
  typedef std::vector<char> buffer_type;
  buffer_type buffer;
  buffer.reserve (1024);
  {
    io::stream<io::back_insert_device<buffer_type> > output_stream (buffer);
    ba::portable_binary_oarchive opba (output_stream);
    opba & source;
  }
  {
    io::stream<io::array_source> input_stream (&buffer[0], buffer.size ());
    ba::portable_binary_iarchive ipba (input_stream);
    ipba & target;
  }
  return;
}

int main (void)
{
  // Some instance of the 'foo' class :
  foo dummy;
  dummy.status = 1;
  dummy.value = 3.14159;
  dummy.special = numeric_limits<double>::quiet_NaN ();
  clog << "dummy is : " << dummy.to_string () << endl;

  // Another instance of the 'foo' class :
  foo clone;

  /* The following instruction is forbidden because foo 
     inherits 'boost::noncopyable' :
   
   clone = dummy; // this ends in a compilation error.
   
   */

  // Anyway, we can use this workaround :
  copy (dummy, clone);
  clog << "clone is : " << clone.to_string () << endl;

  return 0;
}

// end of tutorial_pba_6.cpp


Remark : if a class has been made non-copyable at design, it is likely for a good reason; so it is not recommended to workaround this trait using such a trick, unless you know what you are doing and all the consequences !

To top

An alternative to PBA using text or XML archives made portable

In some circonstances, it may be useful to use the Boost text and XML archives in somewhat portable way. For example, we may want to benefit of the XML archive's human-friendly format for debugging purpose before to switch to the PBA for production runs. However, the text and XML archives provided by the Boost serialization library are not strictly portable, particularly because they does not support the serialization of non-finite floating point numbers. This is because the serialization of floating point numbers depends on some formatting features of standard I/O streams. See the tutorial_pba_7.cpp sample program below :

 The tutorial_pba_7.cpp source code Download   Show/hide
/** tutorial_pba_7.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This example shows how the default behaviour of standard 
 * I/O streams does not support the read/write operations of 
 * non-finite floating point values in a portable way.
 *
 */

#include <string>
#include <iostream>
#include <sstream>
#include <limits>

using namespace std;

int main (void)
{
  {
    float x = numeric_limits<float>::infinity ();
    double y = numeric_limits<double>::quiet_NaN ();
    cout.precision (8);
    cout << "x = " << x << endl;
    cout.precision (16);
    cout << "y = " << y << endl;
  }

  {
    string input ("inf nan");
    istringstream iss (input);
    float x; 
    double y;
    iss >> x >> y;
    if (! iss)
      {
	cerr << "Cannot read 'x' or 'y' : non finite values are not supported !" << endl;
      }
  }
  return 0;
}

// end of tutorial_pba_7.cpp


Depending on the system, one can get some various representation respectively for the infinity and NaN values :

Usually one can print such non finite values in an output stream (using such a non portable representation), but parsing it from an input stream fails !

Hopefully this issue can be solved by configuring the I/O streams with some special locale features provided by Boost (see this link).

The tutorial_pba_8.cpp program shows how this can be achieved through the use of special resources from the boost/archive/codecvt_null.hpp and boost/math/special_functions/nonfinite_num_facets.hpp headers :

 The tutorial_pba_8.cpp source code Download   Show/hide
/** tutorial_pba_8.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This example shows how to store some variables
 * of basic types (bool, integer, floating point numbers, STL string) 
 * using the text or XML archive format associated to a 
 * standard output file stream supporting portable non-finite
 * floating point values.
 *
 */

#include <string>
#include <fstream>
#include <limits>
#include <locale>

#include <boost/cstdint.hpp>
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/serialization/nvp.hpp>
#include <boost/archive/codecvt_null.hpp>
#include <boost/math/special_functions/nonfinite_num_facets.hpp>

using namespace std;

void do_text_out (void)
{
  // The name for the example data text file :  
  string filename = "pba_8.txt"; 

  // Some variables of various primitive types :
  bool        b         = true;
  char        c         = 'B';
  uint32_t    answer    = 42;
  float       value     = numeric_limits<float>::infinity ();
  double      precision = numeric_limits<double>::quiet_NaN ();
  string      question  = "What makes you think she's a witch?";
  
  // Open an output file stream :
  ofstream fout (filename.c_str ());

  // Prepare the output file stream for inf/NaN support :
  locale default_locale (locale::classic (),
			 new boost::archive::codecvt_null<char>);
  locale infnan_locale (default_locale,
			new boost::math::nonfinite_num_put<char>);
  fout.imbue (infnan_locale);
  
  {
    // Create an output text archive attached to the output file :
    boost::archive::text_oarchive ota (fout, boost::archive::no_codecvt);
    
    // Store (serializing) variables :
    ota & b & c & answer & value & precision & question;
  }

  return;   
}

void do_xml_out (void)
{
  // The name for the example data XML file :  
  string filename = "pba_8.xml"; 

  // Some variables of various primitive types :
  bool        b         = true;
  char        c         = 'B';
  uint32_t    answer    = 42;
  float       value     = numeric_limits<float>::infinity ();
  double      precision = numeric_limits<double>::quiet_NaN ();
  string      question  = "What makes you think she's a witch?";
  
  // Open an output file stream :
  ofstream fout (filename.c_str ());

  // Prepare the output file stream for inf/NaN support :
  locale default_locale (locale::classic (),
			 new boost::archive::codecvt_null<char>);
  locale infnan_locale (default_locale,
			new boost::math::nonfinite_num_put<char>);
  fout.imbue (infnan_locale);
   
  {
    // Create an output text archive attached to the output file :
    boost::archive::xml_oarchive oxa (fout, boost::archive::no_codecvt);
    
    // Store (serializing) variables :
    oxa & BOOST_SERIALIZATION_NVP(b)
      & BOOST_SERIALIZATION_NVP(c)
      & BOOST_SERIALIZATION_NVP(answer)
      & BOOST_SERIALIZATION_NVP(value)
      & BOOST_SERIALIZATION_NVP(precision) 
      & BOOST_SERIALIZATION_NVP(question);
  }

  return;   
}

int main (void)
{
  do_text_out ();
  do_xml_out ();
  return 0;
}

// end of tutorial_pba_8.cpp


The program creates two output files :

The tutorial_pba_9.cpp program deserializes the data from the text and XML archive files (respectively pba_8.txt and pba_8.xml) and prints the restored variables :

Loaded values from text archive are: 
  b         = 1
  c         = 'B'
  answer    = 42
  value     = inf
  precision = nan
  question  = "What makes you think she's a witch?"
Loaded values from XML archive are: 
  b         = 1
  c         = 'B'
  answer    = 42
  value     = inf
  precision = nan
  question  = "What makes you think she's a witch?"

 The tutorial_pba_9.cpp source code Download   Show/hide
/** tutorial_pba_9.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This example shows how to load some variables of basic 
 * types (bool, char, integer, floating point numbers, STL string) 
 * using the text or XML archive format associated to a 
 * standard file input stream supporting portable non-finite
 * floating point values.
 *
 */

#include <string>
#include <fstream>
#include <limits>
#include <locale>

#include <boost/cstdint.hpp>
#include <boost/archive/xml_iarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/nvp.hpp>
#include <boost/scoped_ptr.hpp>
#include <boost/archive/codecvt_null.hpp>
#include <boost/math/special_functions/nonfinite_num_facets.hpp>

using namespace std;

void do_text_in (void)
{
  // The name for the example data text file :  
  string filename = "pba_8.txt"; 
  // Some variables of various primitive types :
  bool        b;
  char        c;
  uint32_t    answer;
  float       value;
  double      precision;
  string      question;
  
  // Open an input file stream :
  ifstream fin (filename.c_str ());

  // Prepare the input file stream for inf/NaN support :
  locale default_locale (locale::classic (),
			 new boost::archive::codecvt_null<char>);
  locale infnan_locale (default_locale,
			new boost::math::nonfinite_num_get<char>);
  fin.imbue (infnan_locale);
 
  {
    // Create an input text archive attached to the input file :
    boost::archive::text_iarchive ita (fin, boost::archive::no_codecvt);
    
    // Store (serializing) variables :
    ita & b & c & answer & value & precision & question;
  }

  clog << "Loaded values from text archive are: " << endl;
  clog << "  b         = " << b << endl;
  clog << "  c         = '" << c << "'" <<  endl;
  clog << "  answer    = " << answer << endl;
  clog << "  value     = " << value << endl;
  clog << "  precision = " << precision << endl;
  clog << "  question  = \"" << question << "\"" << endl;

  return;   
}

void do_xml_in (void)
{
  // The name for the example data text file :  
  string filename = "pba_8.xml"; 

  // Some variables of various primitive types :
  bool        b;
  char        c;
  uint32_t    answer;
  float       value;
  double      precision;
  string      question;
  
  // Open an input file stream :
  ifstream fin (filename.c_str ());

  // Prepare the input file stream for inf/NaN support :
  locale default_locale (locale::classic (),
			 new boost::archive::codecvt_null<char>);
  locale infnan_locale (default_locale,
			new boost::math::nonfinite_num_get<char>);
  fin.imbue (infnan_locale);

  {
    // Create an output text archive attached to the output file :
    boost::archive::xml_iarchive ixa (fin, boost::archive::no_codecvt);
    
    // Store (serializing) variables :
    ixa & BOOST_SERIALIZATION_NVP(b)
      & BOOST_SERIALIZATION_NVP(c)
      & BOOST_SERIALIZATION_NVP(answer)
      & BOOST_SERIALIZATION_NVP(value)
      & BOOST_SERIALIZATION_NVP(precision) 
      & BOOST_SERIALIZATION_NVP(question);
  }

  clog << "Loaded values from XML archive are: " << endl;
  clog << "  b         = " << b << endl;
  clog << "  c         = '" << c << "'" <<  endl;
  clog << "  answer    = " << answer << endl;
  clog << "  value     = " << value << endl;
  clog << "  precision = " << precision << endl;
  clog << "  question  = \"" << question << "\"" << endl;

  return;   
}

int main (void)
{
  do_text_in ();
  do_xml_in ();
  return 0;
}

// end of tutorial_pba_9.cpp


To top

Using PBA serialization associated with on-the-fly (de)compressed file streams

The tutorial_pba_10.cpp program illustrates how to serialize, then deserialize, a class from a PBA associated to a GZIP compressed file stream, thanks to a technique provided by the Boost/Iostreams library. The class contains a large STL vector of double precision floating point numbers with arbitrary values:

 The tutorial_pba_10.cpp source code Download   Show/hide
/** tutorial_pba_10.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This example shows how use PBAs combined with on-the-fly 
 * compressed I/O streams.
 *
 */

#include <string>
#include <fstream>
#include <limits>
#include <vector>

#include <boost/cstdint.hpp>
#include <boost/archive/portable_binary_oarchive.hpp>
#include <boost/archive/portable_binary_iarchive.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/gzip.hpp>
#include <boost/serialization/access.hpp>
#include <boost/serialization/vector.hpp>

using namespace std;

class data_type
{
private:
  friend class boost::serialization::access;
  template<class Archive>
  void serialize (Archive & ar, const unsigned int version);
public:
  void print (ostream & out, const string & title) const;
public:
  vector<double> values;
  data_type ();
};

data_type::data_type () : values ()
{
  return;
}

void data_type::print (ostream & out, const string & title) const
{
  out << endl;
  out << title << " :" << endl;
  for (int i = 0; i < this->values.size (); ++i)
    {
      out.precision (16);
      out.width (18);
      out << this->values [i] << ' ' ;
      if ((i%4) == 3) clog << endl;
    }
  out << endl;
  return;
}
  
template<class Archive>
void data_type::serialize (Archive & ar, const unsigned int version)
{
  ar & values;
  return;
}

void do_gzipped_out (void)
{
  // The name for the output data file :  
  string filename = "pba_10.data.gz"; 

  // A data structure to be stored :
  data_type my_data;

  // Fill the vector with arbitrary (possibly non-finite) values :
  size_t dim = 1000;
  my_data.values.reserve (dim);
  for (int i = 0; i < dim; ++i)
    {      
      double val = (i + 1) * (1.0 + 3 * numeric_limits<double>::epsilon ());
      if (i == 4) val = numeric_limits<double>::quiet_NaN ();
      if (i == 23) val = numeric_limits<double>::infinity ();
      if (i == 73) val = -numeric_limits<double>::infinity ();
      if (i == 90) val = 0.0;
      my_data.values.push_back (val);
    }

  // Print:
  my_data.print (clog, "Stored data");

  // Create an output filtering stream :
  boost::iostreams::filtering_ostream zout;
  zout.push (boost::iostreams::gzip_compressor ());
  
  // Open an output file stream in binary mode :
  ofstream fout (filename.c_str (), ios_base::binary);
  zout.push (fout);

  // Save to PBA :
  {
    // Create an output portable binary archive attached to the output file :
    boost::archive::portable_binary_oarchive opba (zout);
    
    // Store (serializing) the data :
    opba & my_data;
  }

  // Clean termination of the streams :
  zout.flush ();
  zout.reset ();

  return;   
}

void do_gzipped_in (void)
{
  // The name for the input data file :  
  string filename = "pba_10.data.gz"; 

  // A data structure to be loaded :
  data_type my_data;

  // Create an input filtering stream :
  boost::iostreams::filtering_istream zin;
  zin.push (boost::iostreams::gzip_decompressor ());
  
  // Open an input file stream in binary mode :
  ifstream fin (filename.c_str (), ios_base::binary);
  zin.push (fin);

  // Load from PBA :
  {
    // Create an input portable binary archive attached to the input file :
    boost::archive::portable_binary_iarchive ipba (zin);
    
    // Load (deserializing) the data :
    ipba & my_data;
  }

  // Print:
  my_data.print (clog, "Loaded data");

  return;   
}

int main (void)
{
  do_gzipped_out (); 
  do_gzipped_in ();
  return 0;
}

// end of tutorial_pba_10.cpp


The resulting compressed pba_10.data.gz file contains 1,574 bytes. This has to be compared with the size of the plain (uncompressed) binary archive which equals 9,001 bytes:

127   1   9   0   0   2 232   3   0   8   3   0   0   0   0   0
240  63   8   3   0   0   0   0   0   0  64   8   4   0   0   0
  0   0   8  64   8   3   0   0   0   0   0  16  64   8 255 255
255 255 255 255 255 127   8   4   0   0   0   0   0  24  64   8
...
which can be interpreted as : Thus one here achieves a very interesting compression level.

It is also possible to use BZIP2 in a similar fashion (using ressources from the boost/iostreams/filter/bzip2.hpp header in place of boost/iostreams/filter/gzip.hpp).

To top

A simple PBA versus text archive benchmark test

The tutorial_pba_11.cpp program runs a benchmark test in the aim to compare the relative fastness of PBA and text archives both for read and write operations. It stores then loads a vector of many (107) random double values and prints the associated (de)serialization time for both kinds of archives:

 The tutorial_pba_11.cpp source code Download   Show/hide
/** tutorial_pba_11.cpp
 *
 * (C) Copyright 2011 François Mauger, Christian Pfligersdorffer
 *
 * Use, modification and distribution is subject to the Boost Software
 * License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 * http://www.boost.org/LICENSE_1_0.txt)
 *
 */

/**
 * The intent of this program is to serve as a tutorial for
 * users of the portable binary archive in the framework of 
 * the Boost/Serialization library. 
 *
 * This example program compares the times needed to serialize
 * and deserialize some large amount of data using PBA and 
 * text archives.
 *
 */

#include <string>
#include <fstream>
#include <vector>

#include <boost/archive/portable_binary_oarchive.hpp>
#include <boost/archive/portable_binary_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/serialization/access.hpp>
#include <boost/serialization/vector.hpp>
#include <boost/random/mersenne_twister.hpp>
#include <boost/random/uniform_real_distribution.hpp>
#include <boost/timer.hpp>

using namespace std;

class data_type
{
private:
  friend class boost::serialization::access;
  template<class Archive>
  void serialize (Archive & ar, const unsigned int version);
public:
  void print (ostream & out, const string & title) const;
public:
  vector<double> values;
  data_type ();
};

data_type::data_type () : values ()
{
  return;
}

void data_type::print (ostream & out, const string & title) const
{
  out << endl;
  out << title << " :" << endl;
  bool skip = false;
  for (int i = 0; i < this->values.size (); ++i)
    {
      if ((i >= 12) && (i < (int) this->values.size () - 8)) 
	{
	  if (! skip) out << " ..." << endl;
	  skip = true;
	  continue;
	}
      out.precision (16);
      out.width (18);
      out << this->values [i] << ' ' ;
      if ((i%4) == 3) clog << endl;
    }
  out << endl;
  return;
}
  
template<class Archive>
void data_type::serialize (Archive & ar, const unsigned int version)
{
  ar & values;
  return;
}

double do_pba_out (const data_type & a_data)
{
  string filename = "pba_11.data"; 
  ofstream fout (filename.c_str (), ios_base::binary);
  boost::timer io_timer;    
  {
    boost::archive::portable_binary_oarchive opba (fout);
    opba & a_data;
  }
  return io_timer.elapsed ();
}

double do_pba_in (data_type & a_data)
{
  string filename = "pba_11.data"; 
  ifstream fin (filename.c_str (), ios_base::binary);
  boost::timer io_timer;    
  {
    boost::archive::portable_binary_iarchive ipba (fin);
    ipba & a_data;
  }
  return io_timer.elapsed ();
}

double do_text_out (const data_type & a_data)
{
  string filename = "pba_11.txt"; 
  ofstream fout (filename.c_str ());
  boost::timer io_timer;    
  {
    boost::archive::text_oarchive ota (fout);
    ota & a_data;
  }
  return io_timer.elapsed ();
}

double do_text_in (data_type & a_data)
{
  string filename = "pba_11.txt"; 
  ifstream fin (filename.c_str ());
  boost::timer io_timer;    
  {
    boost::archive::text_iarchive ita (fin);
    ita & a_data;
  }
  return io_timer.elapsed ();
}

int main (void)
{
  double elapsed_time_pba_out; 
  double elapsed_time_text_out; 
  double elapsed_time_pba_in; 
  double elapsed_time_text_in; 
  data_type my_data; // A data structure to be stored then loaded.

  {
    // Fill the vector with random values :
    size_t dim = 10000000;
    my_data.values.reserve (dim);
    boost::random::mt19937 rng;
    boost::random::uniform_real_distribution<> flat (0.0, 100.0);
    for (int i = 0; i < dim; ++i)
      {      
	double val = flat (rng);
	my_data.values.push_back (val);
      }
    my_data.print (clog, "Stored data in PBA and text archive");
  }

  {
    // Store in PBA :
    elapsed_time_pba_out = do_pba_out (my_data);
  }

  {
    // Store in text archive :
    elapsed_time_text_out = do_text_out (my_data);
  }     

  {
    my_data.values.clear ();
    // Load from PBA :
    elapsed_time_pba_in = do_pba_in (my_data);
    my_data.print (clog, "Loaded data from PBA");
  }

  {
    my_data.values.clear ();
    // Load from text archive :
    elapsed_time_text_in = do_text_in (my_data);
    my_data.print (clog, "Loaded data from text archive");
  }
  
  clog << "PBA  store I/O elapsed time : " << elapsed_time_pba_out  << " (second)" << endl;
  clog << "Text store I/O elapsed time : " << elapsed_time_text_out << " (second)" << endl;  
  clog << "PBA  load  I/O elapsed time : " << elapsed_time_pba_in   << " (second)" << endl;
  clog << "Text load  I/O elapsed time : " << elapsed_time_text_in  << " (second)" << endl;

  return 0;
}

// end of tutorial_pba_11.cpp


On a 1.60 GHz processor running gcc 4.5.2 on Linux 2.6.38, the result is the following:

PBA  store I/O elapsed time : 1.86 (second)
Text store I/O elapsed time : 22.66 (second)
PBA  load  I/O elapsed time : 1.53 (second)
Text load  I/O elapsed time : 19.71 (second)
It this simple case, the use of portable binary archives is faster by at least a factor 10 compared to the traditional Boost text archives. This is a significant saving in time. These performances are highly desirable in the typical framework of scientific/computing activities where large amounts of data are accessed through files. The PBA concept is thus a valuable candidate for such applications.

One can also consider the sizes of the resulting archive files:

The PBA allows to save a typical factor 2 in storage space compared to text archive. This is another strong argument for using PBA.

To top

Revised 2011-11-07

© Copyright François Mauger, Christian Pfligersdorffer 2011.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)