Boost logo

Boost :

From: Terje Slettebø (tslettebo_at_[hidden])
Date: 2002-07-26 09:18:23


>From: "Gennaro Prota" <gennaro_prota_at_[hidden]>

>>On Thu, 25 Jul 2002 17:57:26 +0200, Terje Slettebø
>><tslettebo_at_[hidden]> wrote:

>>>From: "Gennaro Prota" <gennaro_prota_at_[hidden]>
>>
>>>Though NULL is required to expand to something that evaluates to 0 and
>>>so would give no error in that context, it is actually intended for
>>>pointers
>>
>>I know. NULL was only used earlier, to kind of signify that this was used
as
>>a "special value", and because I thought that NULL was required to be a
>>macro for int 0. However, I agree that this is an inappropriate place for
>>it (it's not a pointer, as you say), and the standard apparently doesn't
>>guarantee that it's a macro for int, only a macro for an integral value.

>I'm not sure why you make that distinction. In case the standard
>guaranteed it to be a macro for int would you use it?

No. Like I said above, I also agree that NULL is inappropriate for this, as
it's not a pointer. That it isn't guaranteed to be int, either, is just an
addition. However, there's still the '\0' / L'\0' issue.

>>This is a problem of generic programming, involving literals with
different
>>syntax, such as '\0' and L'\0'.
>>
>>This was brought up in a recent "C Vu" article by Francis Glassborow ("C
Vu"
>>June 2002, p.12, "Trouble with Literals". "C Vu" is the journal of the
>>ACCU). To quote: "We do not have a syntax to deal with the type of a
literal
>>in a generic way."

>Hmmm... it isn't online, is it?

Sorry, no. Only the ACCU site (http://www.accu.org).

>BTW, are you sure he refers to the
>null character and not to generic literals like 'a' and 'x'?

Right, he doesn't refer to the end of string symbol, but he refers to
literals, as you say. He uses a string literal as an example there. However,
this is applicable to the end of string symbol, as well.

>>Even if we use '\0' for char, and L'\0' for wchar_t, how do we generically
>>specify the end of the string of an unknown character type?

>I think the standard library just considers charT() as the 'null
>character' for the type charT. If charT is a POD type (as required for
>it to be a character container type - 17.1.3) then charT() means that
>it is zero-initialized (8.5/7 and 8.5/5, with the corrections of core
>178).

>Yesterday, when seeing you original code with 'NULL', my first thought
>was in fact whether to replace it with a plain 0, or with 'Target()'.

>I never had to deal with this kind of issues, though. And the standard
>didn't clarify my ideas. But I'm quite sure that Dietmar Kuehl can
>literally illuminate you and me on these questions. I hope he is
>reading (though this will make him see my tremendous ignorance on this
>area of the library :-))

Yeah, I'm not sure here, either. I'll send a mail to him about this. If
charT() could be used, then that would solve it for any character type.

>>>(BTW, would we need a similar technique for std::max? :-)
>>
>>Well, std::min and std::max are already defined to take their parameters
by
>>const reference, what do you mean?

>I was alluding to the fact that if one thinks that passing a reference
>to certain types like, let's say, short int _may_ be less efficient
>than passing a short int itself he could also not be happy with a
>generic

Right. Now, I understood. Yes, you're right. This could also remove the
mentioned "reference to reference" problem, if this was used quite
consistently in the library.

>>This leads to the stupid question of the month: why not Source const?
>
>Well, what would it solve? Perhaps if the output operator is a const member
>function, and lexical_cast is passed a non-const object?

I'm still wondering about this one?

>>I altered the unit test, now, to try to find out of this, to also output
the
>>desired result of the conversion (it used to just output the source, and
>>target, and not the specified correct target).

>Well, I went along the same lines on my own, also checking for the
>BOOST_NO_INTRINSIC_WCHAR_T macro. For instance, in order to have a
>clear report I had the following prologue in the unit_test function
>:-)

> std::cout << "wchar_t is: " << typeid(wchar_t).name() << '\n';
> std::cout << BOOST_COMPILER << '\n';
> std::cout << "BOOST_NO_INTRINSIC_WCHAR_T: ";
>
># ifdef BOOST_NO_INTRINSIC_WCHAR_T
> std::cout << "<defined>\n\n";
># else
> std::cout << "<undefined>\n\n";
># endif

Good idea. I tried it too, now. By the way, it shows that
NO_INTRINSIC_WCHAR_T is defined for Intel C++, even though it's able to have
it (and I used the /Zc:wchar_t option). I guess config.hpp considers the
default MSVC mode of Intel C++. However, this means that the output of the
above prologue could be slightly misleading in this respect.

>>Using VC++ 6.0 (no intrinsic wchar_t), and debug output:
>>
>>Test - Succeeded (line 270)
>>Source type = unsigned short (49)
>>Destination type = int (1) (Should have been (1))
>>
>>As you can see, Intel C++, with and without intrinsic wchar_t, works as
it's
>>supposed to. VC++ 6.0, however, behaves weird. It shows L'1' as "49", yet
>>when passing it to stringstream, it interprets it as if it's actually
>>wchar_t, and writes it as "1".

>Very odd!

It's surprising how many subtle issues a supposedly simple component like
lexical_cast may have, at least if you want it fully generic, being able to
take any type, including any character type.

It reminds me a little of the exception-safety debate, where it was found to
have a lot of issues. For example, I saw a simple code-snipped, once, just a
few lines, which was found to have 23 possible execution paths through it (3
normal ones, and 20 in case of exceptions). Luckily, these issues have now
been resolved, and collected in a book like "Exceptional C++" (A fitting
title, when 10 items are devoted to exception safety. :) ).

>Well, the different behaviour boils down to what overload of
>
> basic_ostream<_CharT, _Traits>& operator<< ()
>
>gets called. Let's stick with the STLport.
>
>VC++ 6.0 always call
>
> (a) operator <<(basic_ostream & , _CharT __c)
> [line 206, stl/_ostream.h]
>
>which of course inserts a char.
>
>Intel C++ 6.0 instead calls the same with the /Zc:wchar_t switch, and
>
> (b) _basic_ostream & operator<<(unsigned short __x)
> [line 99]
>
>which outputs numbers without the switch.
>With MSVC's library things are analogous.
>
>The real questions are why VC++ doesn't choose (b) and why Intel C++
>with the switch calls (a).

I think the reason may be quite simple. Intel C++ has (the possibility of)
intrinsic wchar_t. Therefore, when it's turned on, it calls the
char-version, and when it's turned off, it becomes a synonym of unsigned
short, like VC++, and therefore calls the other function.

VC++, on the other hand, doesn't have intrinsic wchar_t. Therefore, they
have a few options: They could behave as Intel C++, using no intrisic
wchar_t, and call the unsigned short version, too. That would never give any
character behaviour for wchar_t. Or, they could do as apparently is done,
treat "unsigned short" special, and call the char-version for it, getting
some wide character support that way.

This of course means that you can't output an unsigned short value, without
having it output as a character. In essence, VC++ "hijacks" the unsigned
short type, and uses it as if it was wchar_t, to compensate for compiler
deficiencies. For this reason, lexical_cast works even if it has no
intrinsic wchar_t. However, that also means unsigned short is taken, so
nobody can use it as another character type, for example.

>It seems that both compilers treat wchar_t differently from an
>unsigned short in certain situations.

I think the answer, as explained above, is that Intel C++ treats wchar_t as
a true intrinsic type, while VC++ treats it as a synonym for unsigned short,
and attempts to remedy this by changing the library, as shown above.

>Try for instance this trivial program (use the switch for icl):
>
>#include <sstream>
>
>int main(int argc, char* argv[])
>{
>using namespace std;
>
>basic_stringstream<unsigned short> str;
>str << L'1';
>
>return 0;
>}

When I try this on Intel C++, with the switch, I get errors. It compiles on
VC++, as expected, as VC++ apparently uses "unsigned short" and "wchar_t" as
aliases, whereas Intel C++, with the switch, does not. Trying it on Intel
C++, without the switch, works there as well, again as expected, as it then
emulates VC++.

>Done it? Well, now change unsigned short to wchar_t....

That works on Intel C++ and on VC++, as expected. For the same reason as the
above.

>> So it's kind of a partial wchar_t support,
>>there, where they have made "unsigned int" to behave as "wchar_t".

>You mean unsigned short as a distinct wchar_t type?

Apparently, as mentioned, they are synonyms, for VC++. Instead, the library
is changed, to let unsigned short work as if it's a character type.

>Anyhow, I hope Intel didn't ruin a so good product with these kinds of
>microsoft-related conundrums :-(

No. As you can see, Intel C++ works fine with the switch. It's only without
it that it emulates VC++.

>What about the example I gave in my previous post and
>typeid(wchar_t).name() on Intel C++ 7.0?

It still gives "unsigned short". However, they are allowed to do this, as
the result is implementation dependent. Still, it's not particularly
helpful, so I'll mention this in an issue report to Intel.

>Wow, peer-review really works. :)
>
>Thanks. :)

>I'm happy that you find all those pedantic comments useful! Ehmm...
>Terje, are you sure? :-)

Yes. :) And it's not pedantic. Your feedback is most valuable, as it has let
me remove bugs in the version, and possible problems, such as the '\0' /
L'\0' issue.

This is not pedantic. This is precision. And that's a virtue. By the way,
you've got credit as contributor in the lexical_cast file, as you may have
seen.

Regards,

Terje


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk