Boost logo

Boost :

From: Ralf W. Grosse-Kunstleve (rwgk_at_[hidden])
Date: 2004-04-20 11:59:11


--- Robert Ramey <ramey_at_[hidden]> wrote:
> > Ralf W. Grosse-Kunstleve wrote:
>
> > Interesting. If I serialize and deserialize std::vector<double> as a text
> > archive but on the same machine, will I always get back exactly the same
> > bit patterns for the double values?
>
> The text archive uses a stream manipulator to set the precision of the
> output to capture all the precision in the double. It uses the
> numeric<limits> to determine this. So I can't say it will be exactly the
> same bit stream but it will be close to the original number. If you want to
> guarantee the exact representation you can either use the included native
> binary archive or serialize the data element (double) as a (non-portable)
> binary object.

FWIW: For the serialization of C++ arrays wrapped as Python objects (via
Python's pickle) I implemented a small "library" (it really is just one header
file) for converting integers and floating-point numbers to a pseudo text
format. In principle it works just like the conversion to base-10 numbers, but
uses base-256 instead. I.e. the result looks like a binary format, but it is as
machine-independent as a text format. The serialized strings are smaller than
regular text format, but larger than raw binary format.

Integers are serialized like this:

    NXX...X

The first character N is the length of the encoded number to follow, i.e. the
number of X above. X encodes the number in base-256 format. Floating point
numbers are stored as two integers, one for the mantissa and one for the
exponent. This can be done portably and without loss of precision because
<cmath> provides std::frexp() and std::ldexp().

I chose the base-256 conversion because it is the most efficient in terms of
memory required for storing the serialized objects. However, the same approach
could be used for portable base-128 or base-64 conversions. The conversion
would just be a little bit slower and the resulting string a little bit larger.

My current implementation can be found here:

http://cvs.sourceforge.net/viewcvs.py/cctbx/scitbx/include/scitbx/serialization/base_256.h?view=markup

I did not have to change the code in 16 months, from which I conclude that the
approach is robust and mature, and it is known to work on a large number of
platforms (http://cci.lbl.gov/cctbx_build/). The base_256.h file can be copied,
modified and redistributed without any restrictions. I'll comment some more if
there is an interest and need, but the approach is fundamentally so simple that
the 300 lines (not counting the copyright notice) should be self-explanatory
;-)

Ralf

        
                
__________________________________
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25¢
http://photos.yahoo.com/ph/print_splash


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk