|
Boost : |
From: rameysb (ramey_at_[hidden])
Date: 2002-03-01 17:14:06
--- In boost_at_y..., Beman Dawes <bdawes_at_a...> wrote:
> At 06:05 PM 2/28/2002, rameysb wrote:
> >
> >> > b) implementation of archives in terms of istream/ostream. no
> >> > registered objections.
> >>
> >> I'm actually not really happy with this, but in the absence of
> >binary
> >> streams I am unable to offer a better alternative.
> >>
> >> > c) binary format archive
> >> > issues - here is a summary of the binary/text archive issue
> >> > i) binary data is fundementally non-portable
> >>
> >> This is fundamentally untrue. Ones and zeros are as portable
as it
> >> gets, and can be used across all locales and character sets.
> >>
> >> > ii) converting to/from text alters floating/double numbers
> >> > iii) binary storage is considered more efficient
> >>
> >> Not to mention less error-prone.
> >>
> >
> >The issue that has concerned me is what happens when an archive
> >is created on a machine with 80 bit IEEE floating point doubles
and
> >this archive is read by a program using the same serialization
> >library on a machine which has 64 bit doubles. if the second
program
> >reads 80 bits is it going to have to include code to convert from
> >every known binary double format to every other one? The same
> >issues occur on sizes of other binaries 16 vs 32 vs 64. The same
> >issue arises when considering that some machines store integers
> >in little endien order (intel) while others store them
> >in big endien order (sparc). I doubt anyone will want to
> >consider all these issues.
>
> Well, there are plenty of existence proofs of portable binary
databases,
> including huge commercial successes.
>
> What about XDR for example?
>
> Or closer to home, I've got one b-tree file format that has been
used for
> 18 years, on everything from in-vehicle embedded systems up to the
largest
> mainframes. The same data files can be read by all.
>
> Here is how it is done: No floating point. Integer and unsigned in
lengths
> of 8, 16, 24, and 32 bits. Big-endian. Unaligned. 8 bit unsigned
chars.
> Programs are expected to convert to their native formats before
performing
> any computations. In practice, a set of big-endian
integer/unsigned
> POD-structs for the appropriate lengths (with no alignment or
padding
> problems on any known platform) does the conversion work pretty
> transparently. Darin Adler and others have argued that the endian
classes
> should provide full operations, and I suppose that would be nice.
>
> Sure the approach is limited, but it provides enough functionality
to
> support applications.
>
> Beyond that, there are a lot of applications that do require
external
> storage, but don't require portability. Binary formats are the
only thing
> practical for some of these.
>
> Sorry if I've misunderstood the context of you saying "binary data
is
> fundamentally non-portable".
>
> --Beman
XDR illustrates my point exactly.
each XDR writer converts data in native form (eg. intel little
endien) to a canonical form (XDR). And each reader converts data
from canonical form(XDR) back to native form (e.g. sparc big
endian). Any differences between XDR and native forms are reconciled
by readers and writers. Issues such as differences in size of
doubles, etc get addressed in readers/writers perhaps at the cost of
making some compromises e.g truncating precision in some cases.
So far so good.
How is that different from writing the data out to another canonical
form - ascii text with locale set to classic, and reading it back.
What's the difference:
a) As far as the portability of the data there is no difference.
b) using text is ZERO effort - using XDR or some other scheme
requires significant programmer effort.
c) I doubt that any such scheme is more robust than using the
functions from the standard library.
d) Execution time. I would expect be a gain storing/loading to/from
a native binary mode. But if you're going to transfer to between
native and canonical form its going to depend upon how much effort
was expended in the implementation. One can't say apriori that using
a on particular canonical form is necessarily faster than any other.
e) Archive size. XDR fills out 32 bit integers to 4 bytes. In text
mode any integers less the 100 take only 3 bytes. I don't know if
XDR handles 64 bit integers but if it does presumable it fills out 8
bytes per integer. I would expect that text files are large in
general but by how much would depend on the particular data.
So here is my summary. Rankings from most to least desirable are:
portability - text, portable_binary, native
programming effort - text, native, portable_binary
archive size - native, portable_binary, text - application dependant
execution speed - native, text/portable_binary (depends on
implementation)
As no option is ranked first in every catagory, and different
applications have different priorities - there is no way that there
can be universal agreement on this point.
I am currently making changes that will implement iarchive and
oarchive as base classes. while the specific types will be handled
through virtual functions.
This wll permit any application to use his preferred format with no
conflict.
I hope this will adddress the issue
Robert Ramey
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk