Boost logo

Boost :

From: Terje Slettebø (tslettebo_at_[hidden])
Date: 2002-05-25 06:37:52


>From: "Mattias Flodin" flodin_at_[hidden]

>>On Sat, May 25, 2002 at 03:37:57AM +0200, Terje Slettebø wrote:
>> Therefore, I've changed this so that it performs the usual conversion (1
<->
>> '1') from/to char/wchar_t, to make it consistent with the conversion
from/to
>> std::basic_string.

>OK. How does it handle lexical_cast<char>(123)? Throwing an exception
>might be preferrable to just returning e.g. '3'.

In fact, that it what it does, throwing the bad_lexical_cast exception, yes.
:)

This is because in this case, it will use the general conversion function,
which writes the int to the stringstream, and then reads it back as a char.
If there's still something left on the stringstream, it will throw an
exception. This is also the way it used to work, because that conversion
function is the same.

In cases where it use special routines, such as converting between char and
string, where the general function (using stringstream) won't work in the
general case (because the char or string may have whitespace), it will check
that the operand sizes are equal. So converting between a one-character
string, and char, is ok.

>> At the moment, this requires partial specialisation, but I intend to make
a
>> version that doesn't require that. However, a reasonably standards
compliant
>> compiler should be able to handle the current version.

>Good - as a VC user I'll be looking forward to that. ;o)

That I understand. I have VC 6, myself, too. Even if VC 7.1 will likely
solve this problem, there's no telling when that may come and in any case,
VC6/7 will be with us for a long time. So I too think it's good to make this
a portable as possible. After all, the original version of lexical_cast
didn't require any special features. I think this version should aim for the
same. Also BCC has problems with partial specialisations, as I've mentioned
earlier, so it doesn't handle this, either.

>> However, like I said, using implicit conversion, one overcomes this
problem,
>> and it's able to do the following:
>>
>> int i=lexical_cast<int>(1.23); // i=1
>>
>> What I'm wondering is, is it ok to perform the conversion as in the
latter
>> case (implicit conversion)? I would think it would be ok, and make the
>> conversion more flexible, as it takes into account implicit conversions,
but
>> as it's a change in the semantics from the original lexical_cast, I'd
like
>> to get feedback on this.

>At first sight, this seems to me like something that would go under
>the category "implementation defined" or "undefined behaviour," since
>trying to do a lexical cast between two "non-lexical" types is rather
>nonsensical. Either you cast from text, to text, or both.

Yeah. Good point. Things like this is likely to happen in generic code, I
guess, where the source and destination type may not be known, so it's used
as a "general" conversion, that may handle arbitrary types, like you also
say, here.

In the lexical_cast docs, Kevlin Henney suggests having a separate
interpret_cast, that would select between lexical_cast and e.g. numeric_cast
(or possibly implicit conversion), for such general conversions. That's also
a possibility. If that's wanted, I could separate the implicit conversions
out in such a separate component, and have lexical_cast perform only lexical
casts (and throw an exception if non-lexical types are used for both source
and destination).

>However, it may come to use in the case of writing generic code, where
>one wants to support both strings and integers as arguments, and turn
>the argument into an integer of the desired type without worrying
>about its representation. So perhaps it could be viewed as an
>"extended static_cast". In other words, for non-lexical types,
>lexical_cast would fall back to a simple static_cast.

Yes. Actually, it doesn't even use static_cast. After dealing with the
special cases, such as conversion between char and string, it only checks
for convertibility, using boost::is_convertible, and if it is, it performs
an implicit conversion. static_cast isn't used, as it isn't needed.

It may be dangerous to use static_cast in lexical_cast, as it may well
accept conversion that makes no sense, that way. static_cast is fine if you
know what you're doing, but lexical_cast has no knowledge about the context,
so it may or may not do the right thing. This may lead to subtle (or not so
subtle) bugs, and may the user, who doesn't expect lexical_cast to "force"
a conversion, that would otherwise not be allowed. I think it would be best
to err on the safe side, here, and only allow conversions that would be
allowed, anyway (implicit conversions), if we allow that, or conversions it
knows how to handle (using stringstream, or a special case).

Consider the following:

struct Base
{
};

struct Derived : Base
{
};

int main()
{
Base b;

Base &br=b;

//Derived &dr=br; // Error (as
expected)
Derived &dr=static_cast<Derived &>(br); // Not ok, but it compiles. By doing
a cast, you're assumed to know what you're doing
}

This also affects conversions between enum and integral types, where
conversion from integral type to enum isn't allowed as an implicit
conversion, but may be done with static_cast, and which gives an unspecified
result, if the integral value is outside the range of the enum.

The way it's done in the current version, is to follow the following order.
It then performs the conversion at the step where it finds a match, or it
reaches the last step:

- Check for special cases (conversions involving char, wchar_t,
std::basic_string, etc.)
- Check for implicit conversion
- Use stringstream, using appropriate character type, based on the source
and destination types

If we allow implicit conversion (the seconds step), then a careful study of
the conversion rules in C++ is needed, to make sure it does the right thing.
For example, there's an implicit conversion from a pointer, to bool, so
"bool b=lexical_cast<bool>("false")" (when boolalpha is enabled) will always
give true (because the char * pointer is non-null). This needs to be checked
for, and handled, at the "special case" step. That's why that step has to be
first.

>On the other hand, looking at dynamic_cast, it only supports the kind
>of conversions for which it was intended - you can't, for instance, do
>a dynamic_cast<float>(int(1)). This is supposedly to protect the
>programmer from confusing the different types of casts. Is this kind
>of protection necessary for lexical_cast?

I think so, yes. I don't think we should have any cast operator in
lexical_cast, in order to make it safe. If an implicit conversion can't be
done, it will use stringstream, so if the conversion makes any sense, it
will perform it one way or the other. The problem with casts is that it
would then do it, even if it didn't make any sense. An exception is
dynamic_cast, as you mention here, as it checks if the cast succeeds.
However, this only works for polymorphic types (types with at least one
virtual function). I don't think there's any portable way of checking if a
type is polymorphic, and trying it on non-polymorphic types will likely
yield a compilation error. So I think it would be best to leave out all
casts. I also see from what you say later, here, that you think casts could
be avoided, as well.

After all, lexcical_cast, despite its name, is a conversion function, and
it's only supposed to convert where it makes sense (and throw an exception,
otherwise). In this sense, it's similar to dynamic_cast, if you check the
return value. If the user needs to perform a cast, it would have to be done
explicitly.

>It could be modeled by
>leaving out specializations for any casts that do not involve lexical
>types on either side. So basically there are three options: implementation
>defined/undefined, static_cast fallback and compile-time errors. At
>present time I'd vote for the third, on the basis that I can't see
>enough of a need for being able to do non-lexical casts with
>lexical_cast.

Since it doesn't use static_cast, it only uses safe conversions, that would
have been allowed, anyway (without lexical_cast), I think we may modify the
options a little, for non-lexical types (e.g. conversion between numerical
types):

- Implementation defined/undefined. What do you mean by this, by the way? If
it uses implicit conversion, it's well-defined, using the C++ conversion
rules, unless overridden if the types are lexical.
- Static cast.
- Compile-time error.

This may be modified further, by making the first option implicit
conversion. As mentioned, that's well-defined.

I don't think option 2 is a good one, either (and it's not necessary, for
safe conversions). I think 1 is fine, though. 3 would also be ok, if people
think that 1 would not be right. I think the implicit conversion is quite
neat, though. :)

Getting option 3 is quite easy - just removing the test for convertibility.
In that case, it may, or may not, succeed with a non-lexical cast. 1.0 -> 1
will succeed, while 1.23 will give a compile-time error. One could also
actively disallow any non-lexical cast, by checking that at least one of the
types are lexical. That may be more consistent, rather than the rather
arbitrary succeed/fail for double->int in the original version. That's more
an artifact of the implementation, as it uses the same function for all
types.

However, maybe we would instead prefer the mentioned interpret_cast, that
handles all sensible conversions, and only letting lexical_cast work when at
least one of the operands are lexical? That would be fine by me.

Alternatively, we could let lexical_cast work as interpret_cast, here.

Is there any problem associated with letting it succeed on non-lexical
types, as well, if the conversion makes sense? After all, the original
version allowed such non-lexical cast, only that it required the values to
be an exact match, so it would succeed or fail, depending on the values
used, not on the types. It may be more consistent, if non-lexical cast is
allowed, to let it succeed on _all_ values. In other words, implicit
conversion. Or fail on all values, if we disallow it. The original version
is kind of between these two.

In the original version of lexical cast, the success or failure is not based
on whether or not the types used are lexical, and any failure happens at
run-time, not compile-time.

By the way, this is not meant as any critique of the original version. It
works well on lexical casts, in other words where one of the types are
lexical, and the other is not. It only has problems if both are lexical (if
they contain whitespace), or none are lexical (where it may or may not
succeed, depending on the values used).

The original version uses a simple and elegant way to do this, using
stringstream. The reason I started on this version, was the posting at the
Boost User's list, where it was pointed out that it may throw when
converting between two lexical types, if the source contains only
whitespace. In addition, it won't do a correct conversion if the source
contains some whitespace, as it will only include up to the first
whitespace, so conversion between two strings would silently give the wrong
result. In the subsequent thread, it was also suggested by another one on
that list, to use implicit conversion where available, which is why that was
implemented. It seemed to be a good idea, as it's kind of an optimising, and
it ensures consistent success on such conversions, rather than letting it
depend on the values used.

Also, the original version doesn't support wide characters (or other
character types in general), so that was implemented.

By the way, I've notified Kevlin Henney about the discussion on this, which
started at the Boost User's list, when it was first posted about, there,
since it is his library.

I guess even so, it's ok for us to disuss any changes, and possible
propositions for change.

>From the docs on lexical_cast
(http://www.boost.org/libs/conversion/lexical_cast.htm):

--- Start quote ---

Future directions

- A mechanism for providing quality-of-service control is needed, e.g.
formatting and exception behavior. In the name of simplicity (and release),
the current version strips out an earlier experimental version.

- Wide character and incompatible std::basic_string issues need to be
catered for.

- An interpret_cast that performs a do-something-reasonable conversion
between types. It would, for instance, select between numeric_cast and
lexical_cast based on std::numeric_limits<>::is_specialized.

--- End quote ---

Point 1 is what is being experimented with now (stream configuration), with
the current lexical cast version, point 2 is already implemented, and point
3 is what we're discussing now.

If point 3 is done, either using a separate interpret_cast, or done by
lexical_cast, it could well use boost::numerical_cast, instead of the
implicit conversion, if implicit conversion is possible, if that would be be
preferred. That's just a one-line change in the code. Note that this may
degrade the performance, compared to implicit cast, due to the run-time
checking of numeric_cast, though. This may not be what the user wants.

>I don't believe lexical_cast has enough existing usage
>of the kind you describe to warrant trying to be backward compatible,
>especially since Boost has never given any guarantees of the sort.

I agree. Using interpret_castor lexical_cast, and allowing the conversion
from double to int, like 1.23 -> 1, instead of, as the original version does
it, throwing a bad_lexical_cast, I think would be reasonable.

Thanks for the feedback. I appreciate it.

Regards,

Terje


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk