|
Boost Users : |
Subject: Re: [Boost-users] lexical_cast between double and string slow in Visual Studio 2013
From: shada (laf163_at_[hidden])
Date: 2014-03-28 00:54:41
how to convert a hex string to int?
2014-03-28 1:44 GMT+08:00 Paul A. Bristow <pbristow_at_[hidden]>:
>
>
> > -----Original Message-----
> > From: Boost-users [mailto:boost-users-bounces_at_[hidden]] On
> Behalf Of
> David
> > Roberts
> > Sent: 27 March 2014 15:58
> > To: boost-users_at_[hidden]
> > Subject: Re: [Boost-users] lexical_cast between double and string slow in
> Visual
> > Studio 2013
> >
> > > That issue is unknown. I'd really appreciate the investigation.
> >
> > I have done some more investigation, and there are two factors that only
> cause
> the
> > slowness when they both occur together.
> >
> > > Try excluding the lexical_cast from test, I have a feeling that this
> is only
> MSVC
> > related issue:
> > >
> > > #include <sstream>
> > > #include <string>
> > >
> > > int main (int, char **)
> > > {
> > > for (double count = 0.0; count < 1000000.0; count += 1.41)
> > > {
> > > std::stringstream ss;
> > > ss << count;
> > > std::string result = std::move(ss.str());
> > > ss.str(std::string());
> > >
> > > ss << result;
> > > ss >> count;
> > > }
> > >
> > > return 0;
> > > }
> > >
> >
> > Running your test program does not exhibit the problem. It runs in
> around 3
> seconds
> > on my machine when built with either Visual Studio 2010 or Visual Studio
> 2013.
> >
> > However, changing it very slightly to match more closely what
> lexical_cast
> does
> > internally does recreate the problem:
> >
> > #include <sstream>
> > #include <string>
> >
> > int main (int, char **)
> > {
> > for (double count = 0.0; count < 1000000.0; count += 1.41)
> > {
> > std::stringstream ss;
> > ss.unsetf(std::ios::skipws);
> > ss.precision(17);
> >
> > ss << count;
> > std::string result = std::move(ss.str());
> > ss.str(std::string());
> >
> > ss << result;
> > ss >> count;
> > }
> > return 0;
> > }
> >
> > The effect of setting the precision to 17 is that lots of 9s appear in
> the
> string
> > representations. (The number 17 is what
> boost::detail::lcast_get_precision(double*)
> > chooses.) Without the precision call the contents of the string called
> result
> start off
> > like this:
> >
> > 0
> > 1.41
> > 2.82
> > 4.23
> > 5.64
> > 7.05
> > 8.46
> > 9.87
> > 11.28
> > 12.69
> >
> > With precision set to 17 they start off like this:
> >
> > 0
> > 1.4099999999999999
> > 2.8199999999999998
> > 4.2299999999999995
> > 5.6399999999999997
> > 7.0499999999999998
> > 8.4599999999999991
> > 9.8699999999999992
> > 11.279999999999999
> > 12.69
> >
> > This happens for both Visual Studio 2010 and Visual Studio 2013.
> >
> > Then the next difference is that Visual Studio 2013 spends a lot longer
> handling all
> > the extra 9s. Changing the program so that the double is converted to a
> string using
> > std::stringstream without a precision call and then back to double using
> lexical_cast
> > takes about 3 seconds for both Visual Studio 2010 and Visual Studio
> 2013. It
> is the
> > combination of having all the extra 9s to parse and using Visual Studio
> 2013
> that
> > makes the test using lexical_cast to go both ways slow.
> >
> > Both Visual Studio 2010 and Visual Studio 2013 do the conversion by
> calling
> > std::num_get<char,std::istreambuf_iterator<char,std::char_traits<char> >
> > >::do_get() which then calls a function called _Stodx() which is
> implemented
> in
> > xstod.c. This function is very different for the two versions. In
> Visual
> Studio 2010 it's
> > a relatively thin wrapper around the C function strtod(). In Visual
> Studio
> 2013
> > _Stodx() has got a completely new implementation that's generated by
> #including
> > xxstod.h with some macros defined.
> >
> > The original C function strtod() is much faster than the new _Stodx()
> when
> there are
> > lots of 9s at the end of the strings being parsed. This modification to
> the
> program:
> >
> > #include <sstream>
> > #include <string>
> >
> > #include <stdlib.h>
> >
> > int main (int, char **)
> > {
> > for (double count = 0.0; count < 1000000.0; count += 1.41)
> > {
> > std::stringstream ss;
> > ss.unsetf(std::ios::skipws);
> > ss.precision(17);
> >
> > ss << count;
> > std::string result = std::move(ss.str());
> > ss.str(std::string());
> >
> > ss << result;
> > char *endptr;
> > count = strtod(ss.str().c_str(), &endptr);
> > }
> > return 0;
> > }
> >
> > has a runtime of about 3 seconds even though it's got to cope with all
> the 9s.
> >
> > I guess only someone from Microsoft or Dinkumware could comment on why
> > _Stodx() was reimplemented.
> >
> > But the other thing is that by setting precision to 17 lexical_cast is
> bloating the string
> > representations of the doubles with lots of 9s in both Visual Studio
> 2010 and
> Visual
> > Studio 2013. Setting precision to 15 instead prevents this, and makes
> the
> original
> > test run faster even with Visual Studio 2013 (about 4 seconds rather
> than 10).
>
> In order to be sure of 'round-tripping' one needs to output
> std::numeric_limits<FPT>::max_digits10 decimal digits.
>
> max_digits10 is 17 for double
>
> enough to ensure that all *possibly* significant digits are used.
>
> digits10 is 15 for double and using this will work for *your* example,
> but will
> fail to 'round-trip' exactly for some values of double.
>
> The reason for a rewrite *might* be that for VS <=11, there was a slight
> 'feature'
>
> ('feature' according to Microsoft, 'bug' according to many, though the C++
> Standard does NOT require round-tripping to be exact. Recent GCC and Clang
> achieve exact round-tripping.)
>
> // The original value causing trouble using serialization was
> 0.00019075645054089487;
> // wrote 0.0019075645054089487
> // read 0.0019075645054089489
> // a increase of just 1 bit.
>
> // Although this test uses a std::stringstream, it is possible that
> // the same behaviour will be found with ALL streams, including cout and
> cin?
>
> // The wrong inputs are only found in a very narrow range of values:
> // approximately 0.0001 to 0.004, with exponent values of 3f2 to 3f6
> // and probably every third value of significand (tested using nextafter).
>
> However, a re-test reveals that this 'feature' is still present using
> VS2013
> (version 12.0).
>
> (This tests uses random double values to find round-trip or loopback
> failures).
>
> > Description: Autorun "J:\Cpp\Misc\Debug\loopback.exe"
> 1>
> 1> failed 78, out of 100000, fraction 0.00077999999999999999
> 1>
> 1> wrong min 5.2173006024157652e-310 == 600ac32350ee
> 1> wrong max 8.7621968418217147e-308 == 2f80e435eb2ef3
> 1>
> 1> test min 1.2417072250589532e-311 == 24928faf2f7
> 1> test max 1.7898906514522990e+308 == 7fefdc71c85a1145
> 1> 186a0 loopback tests done.
> 1>FinalizeBuildStatus:
> 1> Deleting file "Debug\loopback.tlog\unsuccessfulbuild".
> 1> Touching "Debug\loopback.tlog\loopback.lastbuildstate".
> 1>
> 1>Build succeeded.
>
> But this time it only occurs for a *different* and much smaller range :-(
>
> 1> Description: Autorun "J:\Cpp\Misc\Debug\loopback.exe"
> 1>
> 1> Written : 2.0367658404750995e-308 == ea55b0142dc71
> 1> Readback : 2.0367658404751000e-308 == ea55b0142dc72
> 1> Written : 7.2650939912298312e-308 == 2a1eee018d6993
> 1> Readback : 7.2650939912298322e-308 == 2a1eee018d6994
> 1> Written : 1.0124608169366832e-308 == 747c6af50194c
> 1> Readback : 1.0124608169366827e-308 == 747c6af50194b
> ...
> 1> failed 77, out of 100000, fraction 0.00076999999999999996
> 1>
> 1> wrong min 5.4632820247365795e-310 == 6491f5f0ab91
> 1> wrong max 8.7543773312713900e-308 == 2f79b1b891b2c1
> 1>
> 1> test min 2.1782631694667282e-310 == 2819299bf337
> 1> test max 1.7974889513081573e+308 == 7fefff11cdbbcb43
> 1> 186a0 loopback tests done.
> 1>
>
> I've retested using VS 2013 and the failures are now in the narrow range
> very
> near to numeric_limits<double>::min()
>
> Much better, but still not quite right :-(
>
> 1> Readback : 6.1131075857298205e-308 == 25fa9ea293ff26
> 1> failed 3680, out of 10000000, fraction 0.00036800000000000000
> 1>
> 1> wrong min 4.4505959275765217e-308 == 2000699c514815
> 1> wrong max 8.8998755028746106e-308 == 2fff9d0d8336f1
> 1>
> 1> test min 8.9025924527339071e-313 == 29f4307bd7
> 1> test max 1.7976312864655923e+308 == 7fefffb7d9534507
> 1> 98bf7a loopback tests done.
>
> To work around this 'feature' it was only necessary to use std::scientific
> format (but of course this means more characters to digest).
>
> (But with VS2013 the results are as 'wrong' as not using std::scientific,
> so go
> figure ???).
>
> This whole process is a minefield and you can find more than you wanted to
> know
> from Rich Regan's work, starting (but not ending) with
>
>
> http://www.exploringbinary.com/incorrect-round-trip-conversions-in-visual-c-plus
> -plus/
>
> For me, the bottom line is that, for C++ the whole IO needs to be
> rewritten *in
> C++*, perhaps using Fusion.
>
> This might be an exercise for a student ;-)
>
> Boost must be portable, so I'm not sure about your 'improvement' to speed,
> but
> if speed on MSVC matters to you, then use it. Equally, the tiny risk of a
> small
> loss of accuracy may not matter to you either, so using just 15 decimal
> digits
> may be acceptable.
>
> IMO, exact round-tipping is essential (especially for serialization) ,
> speed is
> just nice.
>
> HTH (though I fear not).
>
> Paul
>
> ---
> Paul A. Bristow
> Prizet Farmhouse
> Kendal UK LA8 8AB
> +44 01539 561830 07714330204
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net