Boost logo

Boost :

From: Oleg Abrosimov (beholder_at_[hidden])
Date: 2006-05-17 20:55:06


n1803/n1982 “Simple Numeric Access” proposal is broken

I’ve implemented “long stol(string& str, int base = 10)” function from
n1982 proposal (see code at the bottom of this message) and faced some
issues:

issue 1: non-const version (“string& str”) breaks code like
long l = stol("10");
usage is limited to:
string s = "10";
long l = stol(s);

issue 2: "function call strtol(str.c_str(), 0, base)" requirement can
not be fulfilled.
it conflicts with "erases the characters from the front of str that were
converted to get the result" and with "Throws: invalid_argument if
strtol, strtoul, strtoll, or strtoull reports that no conversion could
be performed"
requirements (error checking prevents usage of '0' for 2nd argument)

issue 3: simple performance test shows that the function proposed is
1000 times slower compared to iostreams solution (number of samples for
test were 100000)
Platform: WinXP VC71 STLPort 4.6.2

all these issues (except for 2) could be solved by removing the "...
erase the characters from the front of str that were converted to get
the result. ..." requirement.
issue 2 has a trivial solution: replace "function call
strtol(str.c_str(), 0, base)" with "function call strtol with
appropriate arguments".

the fixed code is given in “long stol_fixed(string const& str, int base
= 10)” function.

this stol_fixed function is less powerful than the originally version
proposed (if we forgot about issues mentioned). it disallows parsing of
multiple numbers from one long string. the C-library’s strtol function
allows it but if all error conditions should be checked the code becomes
too complex for simple parsing task.

On the other side, all issues with the stol function proposed comes from
an attempt to merge in one ‘simple’ interface both needs: (1) parsing of
multiple numbers from one string; (2) parsing one number from string.
The (2) can be accomplished with simple function interface, but the (1)
should keep a state (pointer to beginning of a char-sequence to parse
from). In the n1982 proposal it was established be erasing parsed
characters, but it is too expensive, because of memory [de]allocations
caused.

It seems that stream-like interface would be good for the (1), for example:
std::string s(“10 20 30 40”);
cistream cs(s);
long l;
cs >> l;
int i;
cs >> i;
long arr[10];
for(int i = 0; i < 10; ++i) {
   cs >> arr[i]; // exception would be thrown (read after eof)
}

or, better:
std::string s(“10 20 30 40”);
cistream cs(s);
long l = cs.read_long(); //convenience function
int i = cs.read_int(); //convenience function
long arr[10];
for(int i = 0; i < 10; ++i) {
   cs >> arr[i]; // exception would be thrown (read after eof)
}

This solution would support:
cistream, costream, ciostream
wcistream, wcostream, wciostream

One more issue with n1982 proposal and the one I’ve proposed in this
group is that both make invisible for C++ programmer that he uses
wrappers around C-library I/O functions, that are incompatible with C++
locales. The solution would be to code this in components names (like
the ‘c’ symbol in names above).

NOTE: with from_string function that was proposed earlier by me
the “parsing one number from string” task with use of C-library
functions can be implemented as from_string_c function.
It can not be safely merged with from_string function, because of C vs.
C++ locale issues. Same applies to (w)string_from function. it’s
C-wrapper counterpart would be (w)string_from_c function. Better naming
suggestions are welcome.
(_byc suffix?)

Best,
Oleg Abrosimov.

// code begins

#include <locale>
#include <cstdlib>
#include <cerrno>
#include <string>

// 1) "12 23 34 34 56 78" - stream-like interface is the best in C++
// 2) " 12" - conversion function is appropriate
namespace std { namespace tr2 {

   // issue 1: non-const version breaks code like
   // "long l = stol("10");"
   // usage is limited to:
   // string s = "1 2";
   // long l = stol(s);
   // long l1 = stol(s);
   //long stol(string const& str, int base = 10)
   long stol(string& str, int base = 10)
   {
       char* endptr = 0;
       const char* nptr = str.c_str();
       long res = strtol(nptr, &endptr, base);
       if (endptr == nptr) {
           throw std::invalid_argument("stol invalid argument. str = '"
+ str + "'");
       }
       if (errno == ERANGE) {
           switch(res) {
               case LONG_MAX :
                   throw std::overflow_error("stol overflow. str = '" +
str + "'");
                   break;
               case LONG_MIN :
                   throw std::underflow_error("stol underflow. str = '"
+ str + "'");
                   break;
           }
       }
       // performance killer !!!!!!!
       str.erase(0, endptr - nptr);
       return res;
   }
   // issue 2: "function call strtol(str.c_str(), 0, base)" requirement
can not be fulfilled
   // it conflicts with "erases the characters from the front of str
that were converted to get the result"
   // and with "Throws: invalid_argument if strtol, strtoul, strtoll, or
strtoull reports that no conversion could be performed"
   // requirements (error cheking prevents usage of '0' for 2nd argument)

   long stol_fixed(string const& str, int base = 10)
   {
       char* endptr = 0;
       const char* nptr = str.c_str();
       long res = strtol(nptr, &endptr, base);
       if (endptr == nptr) {
           throw std::invalid_argument("stol invalid argument. str = '"
+ str + "'");
       }
       if (errno == ERANGE) {
           switch(res) {
               case LONG_MAX :
                   throw std::overflow_error("stol overflow. str = '" +
str + "'");
                   break;
               case LONG_MIN :
                   throw std::underflow_error("stol underflow. str = '"
+ str + "'");
                   break;
           }
       }
       return res;
   }
}}

// performance test code begins
// it uses profiler code by Christopher Diggins
#include <iostream>
#include <sstream>
#include <limits>
#include <vector>
#include <boost/profiler.hpp>
#include <boost/lexical_cast.hpp>

#ifdef max
#undef max
#endif
#ifdef min
#undef min
#endif

// 1) create a string of long values
// 2) read em using std::stringstream
// 3) same with stol()
int main()
try {
   // 1) create a string of long values
   const long lMax = std::numeric_limits<long>::max();
   const long lCount = 1000000L;
   const long lMin = std::numeric_limits<long>::max() - lCount;
   std::string sLongs;
   std::vector<std::string> vecLongs;
   vecLongs.reserve(lCount);
   for (long l = lMax; l > lMin; --l) {
       std::string s = boost::lexical_cast<std::string>(l);
       sLongs += (s + '\t');
       vecLongs.push_back(s);
   }
   sLongs = sLongs.substr(0, sLongs.length() - 1);

   // 2) read em using std::stringstream
   {
       boost::prof::profiler p(": read em using std::stringstream");
       std::istringstream ss(sLongs);
       long l;
       volatile vl;
       while (!sLongs.empty() && ss.good() && !ss.eof()) {
           ss >> l;
           volatile const char* nptr = sLongs.c_str();
           // uncomment it to simulate std::tr2::stol timings
           //sLongs.erase(0, 11);
           vl = l;
       }
   }

   // 3) same with stol_fixed()
   // 3.6 times faster then (2)
   {
       boost::prof::profiler p("same with stol_fixed()");
       long l;
       volatile vl;
       for(long i = 0; !sLongs.empty() && i < lCount; ++i) {
           l = std::tr2::stol_fixed(vecLongs[i]);
           vl = l;
       }
   }

   // 4) same with stol()
   // 1000 times slower then iostreams solution
   {
       boost::prof::profiler p("same with stol()");
       long l;
       volatile vl;
       while (!sLongs.empty()) {
           l = std::tr2::stol(sLongs);
           vl = l;
       }
   }

} catch (std::exception& ex) {
   std::cerr << ex.what() << std::endl;
} catch (...) {
   std::cerr << "Unknown exception occured" << std::endl;
}


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk