|
Boost : |
From: Oleg Abrosimov (beholder_at_[hidden])
Date: 2006-04-30 23:25:48
Hello, boost
This is an idea of project for Google SoC2006 that I want to participate.
The library is called 'string_cvt' or string conversions, it solves
the problem of converting type to string and string to type with minimal
runtime and syntactical overhead.
It is a simple "call for interest" mail.
Idea for this lib was inspired by recent discussion on boost developers
mailing list. The question under discussion was:
Is lexical_cast<> tool good enough for TR2 or not?
A proponents of lexical_cast<> have a point that the main
advantage of lexical_cast<> component is its usage
simplicity and symmetry (angle braces are used in both cases):
int i = lexical_cast<int>("1");
string s = lexical_cast<string>(1);
Additionally, it looks like built-in casts, and it is considered as a
very cool thing.
On the other side, opponents of lexical_cast<> wants more functionality
that doesn't fit into simple cast-like usage like:
The requirements table.
1) controlling conversions via facets (locales)
2) full power of iostreams in simple interface.
All functionality accessible with iostreams
(through manipulators) should be accessible.
3) functor adapters to use with std algorithms
4) error handling and reporting. (what kind of error occurred?)
* optionally report failing without exceptions raising
5) best performance, especially for built-in types and for use in loops
The "Lexical Conversion Library Proposal for TR2" by Kevlin Henney and
Beman Dawes states, that:
"The lexical_cast function template offers a convenient and consistent
form for supporting common conversions to and from arbitrary types when
they are represented as text. The simplification it offers is in
expression-level convenience for such conversions. For more involved
conversions, such as where precision or formatting need tighter control
than is offered by the default behavior of lexical_cast, the
conventional stringstream approach is recommended."
It is clear that lexical_cast is not intended to address (1-4) points in
the list above,
and even (5). For optimizing conversions in loops you'll need to resort
to stringstreams again.
I believe, that stringstreams are not the right tool for daily string
conversions job. We need a special and fully featured solution, which
addresses all issues in the Requirements table above. My dream is that
one has no need to fallback to C-style solutions or to stringstreams
anymore, just one consistent interface for all string conversion needs.
This proposal for Google SoC project is an attempt to develop such a
solution. The final ambitious goal of this project is to make
boost::lexical_cast<> obsolete and replace it in TR2 with a new
proposal. Regardless of SoC, Im going to develop such a library for
boost, but the participation in the Google SoC is important because
otherwise it would be hard to manage enough time to finish this library
before the deadline for TR2 in October.
As a result of this project we would have not only fully documented and
tested library for string conversions, but full comparative performance
analysis would be made to ensure that there is no more any need to
fallback to some other solution.
There are short examples of intended usage of this library (for those
who are too busy to read the full proposals text)
// simple initialization usage:
string s = string_from(1);
int i = from_string(1);
// embedded in expression usage:
double d = 2 + (double)from_string(1);
// usage with special locale:
string s = string_from(1, std::locale(loc_name));
// usage with special format:
string s = string_from(1, std::ios::hex);
// usage with special format and locale:
string s = string_from(1, std::ios::hex, std::locale(loc_name));
// usage with default value provided (exceptions are not thrown):
int i = from_string(1, 1);
// usage with cvtstate& argument (exceptions are not thrown. if
conversion fails, reason is written in the cvtstate parameter supplied):
cvtstate state;
int i = from_string(1, state);
fmt and locale info can be supplied in from_string function too.
To optimize conversions in a loop one can do:
string_cvt cvt(std::ios::hex, std::locale(loc_name));
string s;
for(int i; i < 100; ++i) {
string t;
cvt(i, t);
s += (t + );
}
To convert one sequence to another one can do:
vector<double> vec_doubles(10, 1.2);
vector<string> vec_strings;
string_ocvt_fun<string> ocvtf(cvt); // cvt is defined in a previous example
transform(
vec_doubles.begin(), vec_doubles.end(), // from
back_inserter(vec_strings), // to
ocvtf
);
// and in a reverse direction:
string_icvt_fun<double> icvtf(scvt);
vector<double> vec_doubles1(10);
transform(
vec_strings.begin(), vec_strings.end(), // from
vec_doubles1.begin(), // to
icvtf
);
Details of this proposal are below:
The proposal, part 1. from_string/(w)string_from functions.
From syntactical point of view an alternative to lexical_cast<>
approach was proposed:
to_string/string_to<> pair of functions.
The "Lexical Conversion Library Proposal for TR2" has a good argument
against it:
"... Furthermore, the from/to idea cannot be expressed in a simple and
consistent form. The illusion is that they are easier than lexical_cast
because of the name. This is theory. The practice is that the two forms,
although similarly and symmetrically named, are not at all similar in
use: one requires explicit provision of a template parameter and the
other not. This is a simple usability pitfall that is guaranteed to
catch experienced and inexperienced users alike -- the only difference
being that the experienced user will know what to do with the error
message."
There is one more problem with this approach:
to_string() function is coming from other languages like java, were it
is a member function
of all types, so one can wrote:
String s = object.toString();
It can be spelled as: "Get string from object", or "Convert an object to
string"
Both phrases are straightforward and reflect the way that we think of it:
1) I want a string (String s = )
2) I have an object (object)
3) I'm performing a conversion of this object to string (.toString())
But in C++ the to_string function would be a free-function,
resulting in code like:
string s = to_string(1);
It can be spelled as: "Get string by converting an object '1' to string"
The problem here is that the mental sequence is the same as in the
example above, but
language constructs doesn't reflect it:
1) I want a string (string s = )
2) I have an object (1)
3) I'm performing a conversion of this object to string (to_string(1))
Note that (2) and (3) items are intermixed. It means, that programmer need
to do some additional mental work to jump from item (1) to item (3) and
then back
to item (2) again. The final mind's workflow would be as follows:
1) I want a string (string s = )
2) I have an object (1, but not code it, hold it in memory for a while)
3) I'm performing a conversion of this object to string (to_string)
4) Yes! I can release my memory, and code the object finally. ( (1); )
For such a widely used component as string conversions this additional
complexity is inappropriate.
Note: exactly the same critique can be addressed to lexical_cast<> too.
And it has an additional complexity of explicitly specified template
parameter.
For string to type conversions all things are worse.
in java it would be:
try {
int i = Integer.parseInt(s);
// use i
} catch (NumberFormatException) { /* perform some error handling or
ignore - the usual practice */ }
with lexical_cast<> it would be:
int i = lexical_cast<int>(s);
// use i
// exception handling is usually done on a higher levels
with string_to<> it would be:
int i = string_to<int>(s);
just a name was changed here.
The resulting mental sequence for all 3 variants above is far from optimal.
for lexical_cast<> it would be as follows:
1) I want an int (int i = )
2) I have a string (s, but not code it, hold it in memory for a while)
3) I'm performing a conversion of this string to an int (
lexical_cast<int> )
4) Yes! I can release my memory, and code the string finally. ( (s); )
The same mental complexity here.
the shortest mental sequence possible is as follows:
1) I want an int (int i = )
2) I have a string (s)
3) I'm performing a conversion of this string to an int ( toInt(); )
int i = s.toInt();
this approach scales bad, of cause, but it is optimal in a mental sense.
Furthermore, one can mention that the best way would be as follows:
int i = s;
"Construct an int from a string" - as simple as it could be.
Surprisingly, it can be implemented! (in terms of templated type cast
operator):
class string
{
template<typename T>
operator T();
};
But this solution has major drawbacks:
1) it can not be made symmetrical with type to string conversion
2) it is hard to see such conversions in code
3) it requires changes in the standard strings library
all three can be resolved with some free-function adapter like string_to,
but with more appropriate naming:
int i = from_string(s);
its counterpart would become:
string s = string_from(1);
wstring s = wstring_from(1);
Note:
1) usage is symmetrical
2) no explicit template parameters
The from_string function has one minor drawback:
it can not be used in expressions without explicit casting to the type
desired:
double d = 2.0 + from_string(s); // doesn't works
double d = 2.0 + (double)from_string(s); // does
But it can be seen as an advantage, because:
1) intention is clear and enforced by compiler (operator'+' ambiguity,
or run-time exception if 2.0 becomes 2 and s looks like 1.1)
2) mentally, the expression "(double)from_string(s)" is close to
optimal, it can be thought of as:
"Get double from string" - It is hard to imagine thinking path that is
shorter and reflects intentions in a more straightforward way.
To conclude: the pair of [w]string_from/from_string functions is
proposed to compete lexical_cast<> function template for simple needs of
converting some type to string or string to some type.
Additionally, these functions are not restricted to pure cast-like
syntax, and could accept parameters like locale, std::ios::fmtflags and
boost::cvtstate (it is a part of this proposal) to address issues (1),
(2), and (4) consequently. (see the Requirements table above)
The proposal, part 2. converter objects and functor adapters.
This part is intended to address issues (3) and (5).
It can be achieved by providing templated "converter objects"
along with typedefs for char and wchar_t:
basic_string_icvt<char_type, traits_type, allocator_type>: string_icvt,
wstring_icvt
basic_string_ocvt<char_type, traits_type, allocator_type>: string_ocvt,
wstring_ocvt
basic_string_cvt<char_type, traits_type, allocator_type>: string_cvt,
wstring_cvt
usage can be:
string_cvt scvt(ios_base::hex, locale(""));
string s;
scvt(12, s);
int i;
scvt(s, i);
and functor adapters:
basic_string_ocvt_fun<TCont>
typedef basic_string_ocvt_fun<std::string> string_ocvt_fun;
typedef basic_string_ocvt_fun<std::wstring> wstring_ocvt_fun;
basic_string_icvt_fun<Target, TChar, Traits, TAlloc>;
// template typedef
template <
typename Target,
typename Traits = std::char_traits<char>,
typename TAlloc = std::allocator<char>
>
class string_icvt_fun :
public basic_string_icvt_fun<Target, char, Traits, TAlloc>
// template typedef
template <
typename Target,
typename Traits = std::char_traits<wchar_t>,
typename TAlloc = std::allocator<wchar_t>
>
class wstring_icvt_fun:
public basic_string_icvt_fun<Target, wchar_t, Traits, TAlloc>
These classes can be used as follows:
vector<double> vec_doubles(10, 1.2);
vector<string> vec_strings;
string_ocvt_fun<string> ocvtf(scvt);
transform(
vec_doubles.begin(), vec_doubles.end(), // from
back_inserter(vec_strings), // to
ocvtf
);
string_icvt_fun<double> icvtf(scvt);
vector<double> vec_doubles1(10);
transform(
vec_strings.begin(), vec_strings.end(), // from
vec_doubles1.begin(), // to
icvtf
);
int sz = vec_doubles.size();
for (int i = 0; i < sz; ++i) {
assert(vec_doubles[i] == vec_doubles1[i]);
}
And, finally, all power of iostreams can be achieved with this classes:
std::ios_base::fmtflags could be specified as a parameter of all
converter classes constructors to specify some special formatting.
Additionally, all family of fmtflags related functions from
std::ios_base and std::basic_ios<> are provided. width() and fill()
bounties are also provided. (If I forgot to mention some function - it
was not intentionally, all meaningful functions from iostreams base
classes would be included)
In order to satisfy requirement (1) std::locale object can be specified
as a parameter of constructor, or as an argument to imbue() function.
getloc() function is provided too.
For requirement (4) type cvtstate is provided, that is very close to
std::ios_base::iostate type, but cvtstate is not a typedef for int, to
allow function overloads on it. cvtstate except parameter can be
provided to constructors of converter classes to specify cases when
exceptions should be thrown. By default no exceptions are thrown. The
state of conversion (successful or not) can be viewed with rdstate()
function and all good/bad/fail functions. Additionally, exception
handling behavior can be queried/changed with exceptions() functions.
Again, exactly as in std::basic_ios class.
Performance for built-in types (the requirement number 5) would be
achieved in specializations of components proposed. These
specializations would use the technique, proposed in n1803 document
Simple Numeric Access:
strtoXXX() C-library functions to convert strings to numbers and
sprintf() function to convert from numbers to strings.
Support for non-standard strings can be done by specializing
cvt_tarits<TCont> for them.
Till now I have a minimal working implementation of basic concepts proposed.
Possible mentors for this project could be authors of the Lexical
Conversion Library Proposal for TR2 proposal - Kevlin Henney and/or
Beman Dawes.
Best,
PhD student, Oleg Abrosimov.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk