Boost logo

Boost Users :

Subject: Re: [Boost-users] lexical_cast between double and string slow in Visual Studio 2013
From: Brian Budge (brian.budge_at_[hidden])
Date: 2014-03-28 13:18:32


On Thu, Mar 27, 2014 at 9:54 PM, shada <laf163_at_[hidden]> wrote:
> how to convert a hex string to int?
>

Please start a new thread for this question.

  Brian

>
> 2014-03-28 1:44 GMT+08:00 Paul A. Bristow <pbristow_at_[hidden]>:
>
>>
>>
>> > -----Original Message-----
>> > From: Boost-users [mailto:boost-users-bounces_at_[hidden]] On Behalf
>> > Of
>> David
>> > Roberts
>> > Sent: 27 March 2014 15:58
>> > To: boost-users_at_[hidden]
>> > Subject: Re: [Boost-users] lexical_cast between double and string slow
>> > in
>> Visual
>> > Studio 2013
>> >
>> > > That issue is unknown. I'd really appreciate the investigation.
>> >
>> > I have done some more investigation, and there are two factors that only
>> > cause
>> the
>> > slowness when they both occur together.
>> >
>> > > Try excluding the lexical_cast from test, I have a feeling that this
>> > > is only
>> MSVC
>> > related issue:
>> > >
>> > > #include <sstream>
>> > > #include <string>
>> > >
>> > > int main (int, char **)
>> > > {
>> > > for (double count = 0.0; count < 1000000.0; count += 1.41)
>> > > {
>> > > std::stringstream ss;
>> > > ss << count;
>> > > std::string result = std::move(ss.str());
>> > > ss.str(std::string());
>> > >
>> > > ss << result;
>> > > ss >> count;
>> > > }
>> > >
>> > > return 0;
>> > > }
>> > >
>> >
>> > Running your test program does not exhibit the problem. It runs in
>> > around 3
>> seconds
>> > on my machine when built with either Visual Studio 2010 or Visual Studio
>> > 2013.
>> >
>> > However, changing it very slightly to match more closely what
>> > lexical_cast
>> does
>> > internally does recreate the problem:
>> >
>> > #include <sstream>
>> > #include <string>
>> >
>> > int main (int, char **)
>> > {
>> > for (double count = 0.0; count < 1000000.0; count += 1.41)
>> > {
>> > std::stringstream ss;
>> > ss.unsetf(std::ios::skipws);
>> > ss.precision(17);
>> >
>> > ss << count;
>> > std::string result = std::move(ss.str());
>> > ss.str(std::string());
>> >
>> > ss << result;
>> > ss >> count;
>> > }
>> > return 0;
>> > }
>> >
>> > The effect of setting the precision to 17 is that lots of 9s appear in
>> > the
>> string
>> > representations. (The number 17 is what
>> boost::detail::lcast_get_precision(double*)
>> > chooses.) Without the precision call the contents of the string called
>> > result
>> start off
>> > like this:
>> >
>> > 0
>> > 1.41
>> > 2.82
>> > 4.23
>> > 5.64
>> > 7.05
>> > 8.46
>> > 9.87
>> > 11.28
>> > 12.69
>> >
>> > With precision set to 17 they start off like this:
>> >
>> > 0
>> > 1.4099999999999999
>> > 2.8199999999999998
>> > 4.2299999999999995
>> > 5.6399999999999997
>> > 7.0499999999999998
>> > 8.4599999999999991
>> > 9.8699999999999992
>> > 11.279999999999999
>> > 12.69
>> >
>> > This happens for both Visual Studio 2010 and Visual Studio 2013.
>> >
>> > Then the next difference is that Visual Studio 2013 spends a lot longer
>> handling all
>> > the extra 9s. Changing the program so that the double is converted to a
>> string using
>> > std::stringstream without a precision call and then back to double using
>> lexical_cast
>> > takes about 3 seconds for both Visual Studio 2010 and Visual Studio
>> > 2013. It
>> is the
>> > combination of having all the extra 9s to parse and using Visual Studio
>> > 2013
>> that
>> > makes the test using lexical_cast to go both ways slow.
>> >
>> > Both Visual Studio 2010 and Visual Studio 2013 do the conversion by
>> > calling
>> > std::num_get<char,std::istreambuf_iterator<char,std::char_traits<char> >
>> > >::do_get() which then calls a function called _Stodx() which is
>> > > implemented
>> in
>> > xstod.c. This function is very different for the two versions. In
>> > Visual
>> Studio 2010 it's
>> > a relatively thin wrapper around the C function strtod(). In Visual
>> > Studio
>> 2013
>> > _Stodx() has got a completely new implementation that's generated by
>> #including
>> > xxstod.h with some macros defined.
>> >
>> > The original C function strtod() is much faster than the new _Stodx()
>> > when
>> there are
>> > lots of 9s at the end of the strings being parsed. This modification to
>> > the
>> program:
>> >
>> > #include <sstream>
>> > #include <string>
>> >
>> > #include <stdlib.h>
>> >
>> > int main (int, char **)
>> > {
>> > for (double count = 0.0; count < 1000000.0; count += 1.41)
>> > {
>> > std::stringstream ss;
>> > ss.unsetf(std::ios::skipws);
>> > ss.precision(17);
>> >
>> > ss << count;
>> > std::string result = std::move(ss.str());
>> > ss.str(std::string());
>> >
>> > ss << result;
>> > char *endptr;
>> > count = strtod(ss.str().c_str(), &endptr);
>> > }
>> > return 0;
>> > }
>> >
>> > has a runtime of about 3 seconds even though it's got to cope with all
>> > the 9s.
>> >
>> > I guess only someone from Microsoft or Dinkumware could comment on why
>> > _Stodx() was reimplemented.
>> >
>> > But the other thing is that by setting precision to 17 lexical_cast is
>> bloating the string
>> > representations of the doubles with lots of 9s in both Visual Studio
>> > 2010 and
>> Visual
>> > Studio 2013. Setting precision to 15 instead prevents this, and makes
>> > the
>> original
>> > test run faster even with Visual Studio 2013 (about 4 seconds rather
>> > than 10).
>>
>> In order to be sure of 'round-tripping' one needs to output
>> std::numeric_limits<FPT>::max_digits10 decimal digits.
>>
>> max_digits10 is 17 for double
>>
>> enough to ensure that all *possibly* significant digits are used.
>>
>> digits10 is 15 for double and using this will work for *your* example,
>> but will
>> fail to 'round-trip' exactly for some values of double.
>>
>> The reason for a rewrite *might* be that for VS <=11, there was a slight
>> 'feature'
>>
>> ('feature' according to Microsoft, 'bug' according to many, though the C++
>> Standard does NOT require round-tripping to be exact. Recent GCC and
>> Clang
>> achieve exact round-tripping.)
>>
>> // The original value causing trouble using serialization was
>> 0.00019075645054089487;
>> // wrote 0.0019075645054089487
>> // read 0.0019075645054089489
>> // a increase of just 1 bit.
>>
>> // Although this test uses a std::stringstream, it is possible that
>> // the same behaviour will be found with ALL streams, including cout and
>> cin?
>>
>> // The wrong inputs are only found in a very narrow range of values:
>> // approximately 0.0001 to 0.004, with exponent values of 3f2 to 3f6
>> // and probably every third value of significand (tested using nextafter).
>>
>> However, a re-test reveals that this 'feature' is still present using
>> VS2013
>> (version 12.0).
>>
>> (This tests uses random double values to find round-trip or loopback
>> failures).
>>
>> > Description: Autorun "J:\Cpp\Misc\Debug\loopback.exe"
>> 1>
>> 1> failed 78, out of 100000, fraction 0.00077999999999999999
>> 1>
>> 1> wrong min 5.2173006024157652e-310 == 600ac32350ee
>> 1> wrong max 8.7621968418217147e-308 == 2f80e435eb2ef3
>> 1>
>> 1> test min 1.2417072250589532e-311 == 24928faf2f7
>> 1> test max 1.7898906514522990e+308 == 7fefdc71c85a1145
>> 1> 186a0 loopback tests done.
>> 1>FinalizeBuildStatus:
>> 1> Deleting file "Debug\loopback.tlog\unsuccessfulbuild".
>> 1> Touching "Debug\loopback.tlog\loopback.lastbuildstate".
>> 1>
>> 1>Build succeeded.
>>
>> But this time it only occurs for a *different* and much smaller range :-(
>>
>> 1> Description: Autorun "J:\Cpp\Misc\Debug\loopback.exe"
>> 1>
>> 1> Written : 2.0367658404750995e-308 == ea55b0142dc71
>> 1> Readback : 2.0367658404751000e-308 == ea55b0142dc72
>> 1> Written : 7.2650939912298312e-308 == 2a1eee018d6993
>> 1> Readback : 7.2650939912298322e-308 == 2a1eee018d6994
>> 1> Written : 1.0124608169366832e-308 == 747c6af50194c
>> 1> Readback : 1.0124608169366827e-308 == 747c6af50194b
>> ...
>> 1> failed 77, out of 100000, fraction 0.00076999999999999996
>> 1>
>> 1> wrong min 5.4632820247365795e-310 == 6491f5f0ab91
>> 1> wrong max 8.7543773312713900e-308 == 2f79b1b891b2c1
>> 1>
>> 1> test min 2.1782631694667282e-310 == 2819299bf337
>> 1> test max 1.7974889513081573e+308 == 7fefff11cdbbcb43
>> 1> 186a0 loopback tests done.
>> 1>
>>
>> I've retested using VS 2013 and the failures are now in the narrow range
>> very
>> near to numeric_limits<double>::min()
>>
>> Much better, but still not quite right :-(
>>
>> 1> Readback : 6.1131075857298205e-308 == 25fa9ea293ff26
>> 1> failed 3680, out of 10000000, fraction 0.00036800000000000000
>> 1>
>> 1> wrong min 4.4505959275765217e-308 == 2000699c514815
>> 1> wrong max 8.8998755028746106e-308 == 2fff9d0d8336f1
>> 1>
>> 1> test min 8.9025924527339071e-313 == 29f4307bd7
>> 1> test max 1.7976312864655923e+308 == 7fefffb7d9534507
>> 1> 98bf7a loopback tests done.
>>
>> To work around this 'feature' it was only necessary to use std::scientific
>> format (but of course this means more characters to digest).
>>
>> (But with VS2013 the results are as 'wrong' as not using std::scientific,
>> so go
>> figure ???).
>>
>> This whole process is a minefield and you can find more than you wanted to
>> know
>> from Rich Regan's work, starting (but not ending) with
>>
>>
>> http://www.exploringbinary.com/incorrect-round-trip-conversions-in-visual-c-plus
>> -plus/
>>
>> For me, the bottom line is that, for C++ the whole IO needs to be
>> rewritten *in
>> C++*, perhaps using Fusion.
>>
>> This might be an exercise for a student ;-)
>>
>> Boost must be portable, so I'm not sure about your 'improvement' to speed,
>> but
>> if speed on MSVC matters to you, then use it. Equally, the tiny risk of a
>> small
>> loss of accuracy may not matter to you either, so using just 15 decimal
>> digits
>> may be acceptable.
>>
>> IMO, exact round-tipping is essential (especially for serialization) ,
>> speed is
>> just nice.
>>
>> HTH (though I fear not).
>>
>> Paul
>>
>> ---
>> Paul A. Bristow
>> Prizet Farmhouse
>> Kendal UK LA8 8AB
>> +44 01539 561830 07714330204
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Boost-users mailing list
>> Boost-users_at_[hidden]
>> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net