Boost logo

Boost Users :

From: François Mauger (mauger_at_[hidden])
Date: 2008-08-25 16:19:54


Hi,

> binary archive" but left it as a demo or example.
>
> I'm sure many will think this is a totally stupid suggestion, but I
> can't resist making a fool of myself. ;-)

> Since floats are the problem with portable binary archives, why not
> punt on this issue and render floating point types (only) in ascii.
>
> For many uses floating point is not the critical path

I don't agree. In scientific applications one uses huge amount (>>TB)
of data in float format (particularly doubles) and one needs to share
data files on some computing clusters knowning nothing about the
architecture of the systems used by end-users.
The question of exactness (no loss of the initial precision), and I/O
fastness is very important. It is also crucial to minimize storage space
with additionnal typical capabilities of compression (gzip and bz2 filters
are ok for that).

> and binary
> archives solve many problems other than floating point: endian-ness,
> native integer size differences, etc.

yes this part is a must.

> And those issues can be quite
> difficult to deal with otherwise.
>
> This can be viewed as a special case of a standard technique: for
> highly non-standard data using an independent format that translates
> easily into each proprietary format.
> In this case the independent
> format is simply ascii. In fact, since we are only rendering the
> characters "[-+e.0-9]" we could use a modified BCD or other compressed
> format to provide the compression that is typically what people assume
> in binary formats.

ok this is only a set of 14 glyphs so it could be hosted via short ints
(with 2 bits unused) consider a typical float (relative precision ~1e-7).
If one need to store pi as +0.3141592e+01 (ASCII) it is 14 characters
(only 11 is one saves leading'+' and exponent '+0' chars for >0 mantissa
and exponent)
that could be serialized using 14/11 shorts, so this is 28/22 bytes.
This has to be compared with 4 bytes for floats! This induces a typical
increase of storage by a factor ~6 at the additionnal CPU cost of the
underlying internal format conversion (ala sprintf). For me this is not
acceptable.
Similar approach holds for doubles.

More, as soon as you use ASCII format to store a float/double, you must
make a decision about the rounding of the last significant digit.
In most applications, this is not
a real problem for people don't care about ultimate numeric precision...
but in some circunstances, the reproductibility/portability I/O of
the whole numeric precision is absolutely necessary.
Imagine X be 3.141592654... being stored as 3.14160 in some archive
for computing sqrt(pi-X) on some other system.
I'm not sure this behaviour is acceptable by scientists (at least not by
me ;-) ).

So for me 'IEEE' format is the best approach one could use.
Boost.archive is so simple and easy to use (with a little care)
that I see no reason
why we should not get an efficient portable binary archive with floats.

Of course for some very very specific application that needs
long doubles or other not so portable stuff, one could use
the HDF5 library but this is not as simple as boost, unfortunately.

regards

frc

-- 
Francois Mauger
Laboratoire de Physique Corpusculaire de Caen et Universite de Caen
ENSICAEN - 6, Boulevard du Marechal Juin, 14050 CAEN Cedex, FRANCE
e-mail: mauger_at_[hidden]
tel.: (0/+33) 2 31 45 25 12
fax: (0/+33) 2 31 45 25 49

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net