Re: [Boost-bugs] [Boost C++ Libraries] #12527: cpp_bin_float: Anal fixation. Part 3. Double rounding when result of convert_to<double>() is a subnormal

Subject: Re: [Boost-bugs] [Boost C++ Libraries] #12527: cpp_bin_float: Anal fixation. Part 3. Double rounding when result of convert_to<double>() is a subnormal
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2016-11-13 00:00:59


#12527: cpp_bin_float: Anal fixation. Part 3. Double rounding when result of
convert_to<double>() is a subnormal
-------------------------------+----------------------------
  Reporter: Michael Shatz | Owner: johnmaddock
      Type: Bugs | Status: reopened
 Milestone: To Be Determined | Component: multiprecision
   Version: Boost 1.62.0 | Severity: Problem
Resolution: | Keywords:
-------------------------------+----------------------------

Comment (by Michael Shatz):

 Try this:


 {{{
 template <typename T>
 double my_convert_to_double(const T& x)
 {
   double ret = 0;
   if (isfinite(x)) {
     if (x.backend().exponent() >= -1023 - 52 && x != 0) {
       if (x.backend().exponent() <= 1023) {
         int e = x.backend().exponent();
         T y = ldexp(abs(x), 55 - e);
         T t = trunc(y);
         int64_t ti = t.template convert_to<int64_t>();
         if ((ti & 1)==0) {
           if (t < y)
             ti |= 1;
         }
         if (e >= -1023 + 1) {
           ret = ldexp(double(ti), e - 55);
         } else {
           // subnormal
           typedef
 boost::multiprecision::number<boost::multiprecision::cpp_bin_float<128,
 boost::multiprecision::backends::digit_base_2> > cpp_bin_float128_t;
           cpp_bin_float128_t sx = ldexp(cpp_bin_float128_t(ti), e - 55);
           sx += DBL_MIN;
           e = -1023 + 1;
           cpp_bin_float128_t sy = ldexp(sx, 55 - e);
           cpp_bin_float128_t st = trunc(sy);
           ti = st.convert_to<int64_t>();
           if ((ti & 1)==0) {
             if (st < sy)
               ti |= 1;
           }
           ret = ldexp(double(ti), e - 55) - DBL_MIN;
         }
       } else {
         // overflow
         ret = HUGE_VAL;
       }
     }
   } else {
     if (isnan(x))
       return nan("");
     // inf
     ret = HUGE_VAL;
   }
   return x.backend().sign() ? -ret : ret;
 }

 }}}

 For 128-bit cpp_bin_float on 64-bit x64 platform it is approximately 4
 times faster than your last variant.
 Also, to me it looks much simpler.

 Unfortunately, I am not too good in complex template programming, so I
 don't want to code
 template <class Float, unsigned Digits, digit_base_type DigitBase, class
 Allocator, class Exponent, Exponent MinE, Exponent MaxE>
 inline typename boost::enable_if_c<boost::is_float<Float>::value>::type
 eval_convert_to(Float *res, const cpp_bin_float<Digits, DigitBase,
 Allocator, Exponent, MinE, MaxE> &original_arg)
 {
 ...
 }
 Too many things in that code look to me as a black magic.

 But, hopefully, the ideas are obvious from the example above.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/12527#comment:14>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:20 UTC