|
Boost : |
Subject: Re: [boost] [boost::endian] Request for comments/interest (floating point endian handling)
From: Cliff Green (cliffg_at_[hidden])
Date: 2010-06-01 11:51:28
>> I did the same thing many years ago (when writing an endian
>> swapping utility), and noticed that when returning by value,
>> the fp normalization would occur, causing silent "changing
>> of the internal bits". Floating point values could be
>> "in-place" swapped, but this was the only safe usage (I.e.
>> after the "in-place" swap, the values would have to be sent
>> across the network or written to disk, etc). The brittleness
>> with swapping fp values is a subtle and non-obvious problem.
>
> I hadn't heard of this problem before, but I'm glad to know about it now.
> Do you know whether zeroing the remaining bits solves the problem? If so,
> specializing for FP versus integer types would allow handling the
> difference.
For integral types, all bit patterns are valid numbers, so swapping bytes
always creates a valid number. Performing a swap twice always ends up with
the same original number (I.e. for all values in integral type T,
swap(swap(T)) == T, no matter if the values are read / accessed between the
swaps).
With floating point types, various bit patterns mean different things,
including signed and unsigned zero, NAN, infinity, etc. IEEE 754 has "sticky
bits" to capture "inexact", "overflow", "underflow", "div by 0" and
"invalid" states. A FP processor instruction will look at the bit pattern
and do things with the value, including silently changing bits. For example,
from http://en.wikibooks.org/wiki/Floating_Point/Normalization:
"We say that the floating point number is normalized if the fraction is at
least 1/b, where b is the base. In other words, the mantissa would be too
large to fit if it were multiplied by the base. Non-normalized numbers are
sometimes called denormal; they contain less precision than the
representation normally can hold.
If the number is not normalized, then you can subtract 1 from the exponent
while multiplying the mantissa by the base, and get another floating point
number with the same value. Normalization consists of doing this repeatedly
until the number is normalized. Two distinct normalized floating point
numbers cannot be equal in value."
Once a fp number is byte swapped, the only safe way to treat it is as a char
array (byte buffer). Anything else (e.g. returning it by value from a
function, or just reading / accessing it as a fp value) may cause
normalization or other fp operations to kick in on certain values. It's a
pernicious problem, since bits are silently changed for only certain values,
and swap(swap(T)) no longer holds true for all values.
I'm not familiar enough with "safe fp byte swapping" techniques to compare
or recommend them (obviously, converting to / from a text representation
will work, with the usual rounding and accuracy constraints). Since the
whole point of endian / byte swapping utilities are to allow binary values
to be serialized / IO'ed (network, disk, etc), without having to convert to
/ from text, there might be ways to grab the various portions (exponent,
mantissa, etc) and treat them as integral values (including byte swapping).
This would be format specific (e.g. IEEE 754), and would entail querying the
fp format (C++ standard is agnostic wrt fp formats).
In code I've seen where fp values are byte swapped, it's always "in-place"
swapping, and it's just luck that there's no code "in between the swaps"
that might cause normalization (or other bit changing) to occur. For Boost
quality libraries, I would always vote against code that silently fails with
what appears to be typical, normal usage. That's why I brought up the point
about disallowing or explicitly supporting fp types in endian / byte
swapping libraries.
Cliff
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk