|
Boost : |
From: Gennaro Prota (gennaro_prota_at_[hidden])
Date: 2002-07-27 12:59:11
On Fri, 26 Jul 2002 16:18:23 +0200, Terje Slettebø
<tslettebo_at_[hidden]> wrote:
>>From: "Gennaro Prota" <gennaro_prota_at_[hidden]>
>
>>>On Thu, 25 Jul 2002 17:57:26 +0200, Terje Slettebø
>>><tslettebo_at_[hidden]> wrote:
>
>>>>From: "Gennaro Prota" <gennaro_prota_at_[hidden]>
>>>
>>>>Though NULL is required to expand to something that evaluates to 0 and
>>>>so would give no error in that context, it is actually intended for
>>>>pointers
>>>
>>>I know. NULL was only used earlier, to kind of signify that this was used
>as
>>>a "special value", and because I thought that NULL was required to be a
>>>macro for int 0. However, I agree that this is an inappropriate place for
>>>it (it's not a pointer, as you say), and the standard apparently doesn't
>>>guarantee that it's a macro for int, only a macro for an integral value.
>
>>I'm not sure why you make that distinction. In case the standard
>>guaranteed it to be a macro for int would you use it?
>
>No. Like I said above, I also agree that NULL is inappropriate for this, as
>it's not a pointer. That it isn't guaranteed to be int, either, is just an
>addition. However, there's still the '\0' / L'\0' issue.
>
>>>This is a problem of generic programming, involving literals with
>different
>>>syntax, such as '\0' and L'\0'.
>>>
>>>This was brought up in a recent "C Vu" article by Francis Glassborow ("C
>Vu"
>>>June 2002, p.12, "Trouble with Literals". "C Vu" is the journal of the
>>>ACCU). To quote: "We do not have a syntax to deal with the type of a
>literal
>>>in a generic way."
>
>>Hmmm... it isn't online, is it?
>
>Sorry, no. Only the ACCU site (http://www.accu.org).
>
>>BTW, are you sure he refers to the
>>null character and not to generic literals like 'a' and 'x'?
>
>Right, he doesn't refer to the end of string symbol, but he refers to
>literals, as you say. He uses a string literal as an example there. However,
>this is applicable to the end of string symbol, as well.
>
>>>Even if we use '\0' for char, and L'\0' for wchar_t, how do we generically
>>>specify the end of the string of an unknown character type?
>
>>I think the standard library just considers charT() as the 'null
>>character' for the type charT. If charT is a POD type (as required for
>>it to be a character container type - 17.1.3) then charT() means that
>>it is zero-initialized (8.5/7 and 8.5/5, with the corrections of core
>>178).
>
>>Yesterday, when seeing you original code with 'NULL', my first thought
>>was in fact whether to replace it with a plain 0, or with 'Target()'.
>
>>I never had to deal with this kind of issues, though. And the standard
>>didn't clarify my ideas. But I'm quite sure that Dietmar Kuehl can
>>literally illuminate you and me on these questions. I hope he is
>>reading (though this will make him see my tremendous ignorance on this
>>area of the library :-))
>
>Yeah, I'm not sure here, either. I'll send a mail to him about this. If
>charT() could be used, then that would solve it for any character type.
>
>>>>(BTW, would we need a similar technique for std::max? :-)
>>>
>>>Well, std::min and std::max are already defined to take their parameters
>by
>>>const reference, what do you mean?
>
>>I was alluding to the fact that if one thinks that passing a reference
>>to certain types like, let's say, short int _may_ be less efficient
>>than passing a short int itself he could also not be happy with a
>>generic
>
>Right. Now, I understood. Yes, you're right. This could also remove the
>mentioned "reference to reference" problem, if this was used quite
>consistently in the library.
>
>>>This leads to the stupid question of the month: why not Source const?
>>
>>Well, what would it solve? Perhaps if the output operator is a const member
>>function, and lexical_cast is passed a non-const object?
>
>I'm still wondering about this one?
Ah! I can't remember what I was exactly thinking. It is very likely
that I was wrong.
>
>>>I altered the unit test, now, to try to find out of this, to also output
>the
>>>desired result of the conversion (it used to just output the source, and
>>>target, and not the specified correct target).
>
>>Well, I went along the same lines on my own, also checking for the
>>BOOST_NO_INTRINSIC_WCHAR_T macro. For instance, in order to have a
>>clear report I had the following prologue in the unit_test function
>>:-)
>
>> std::cout << "wchar_t is: " << typeid(wchar_t).name() << '\n';
>> std::cout << BOOST_COMPILER << '\n';
>> std::cout << "BOOST_NO_INTRINSIC_WCHAR_T: ";
>>
>># ifdef BOOST_NO_INTRINSIC_WCHAR_T
>> std::cout << "<defined>\n\n";
>># else
>> std::cout << "<undefined>\n\n";
>># endif
>
>Good idea. I tried it too, now. By the way, it shows that
>NO_INTRINSIC_WCHAR_T is defined for Intel C++, even though it's able to have
>it (and I used the /Zc:wchar_t option). I guess config.hpp considers the
>default MSVC mode of Intel C++. However, this means that the output of the
>above prologue could be slightly misleading in this respect.
>
It depends on what "no intrinsic wchar_t means" :-)
I have also looked at the msdn to see if .NET has a predefined macro
to test for the /Zc:wchar_t switch and... well, there isn't! Because
there's a macro that is defined also if a typedef unsigned short
wchar_t is used!
--- "_WCHAR_T_DEFINED - Defined when wchar_t is defined. Typically, wchar_t is defined when you use /Zc:wchar_t or when typedef unsigned short wchar_t; is executed in code" --- >>>Using VC++ 6.0 (no intrinsic wchar_t), and debug output: >>> >>>Test - Succeeded (line 270) >>>Source type = unsigned short (49) >>>Destination type = int (1) (Should have been (1)) >>> >>>As you can see, Intel C++, with and without intrinsic wchar_t, works as >it's >>>supposed to. VC++ 6.0, however, behaves weird. It shows L'1' as "49", yet >>>when passing it to stringstream, it interprets it as if it's actually >>>wchar_t, and writes it as "1". > >>Very odd! > >It's surprising how many subtle issues a supposedly simple component like >lexical_cast may have, at least if you want it fully generic, being able to >take any type, including any character type. > >It reminds me a little of the exception-safety debate, where it was found to >have a lot of issues. For example, I saw a simple code-snipped, once, just a >few lines, which was found to have 23 possible execution paths through it (3 >normal ones, and 20 in case of exceptions). Luckily, these issues have now >been resolved, and collected in a book like "Exceptional C++" (A fitting >title, when 10 items are devoted to exception safety. :) ). > >>Well, the different behaviour boils down to what overload of >> >> basic_ostream<_CharT, _Traits>& operator<< () >> >>gets called. Let's stick with the STLport. >> >>VC++ 6.0 always call >> >> (a) operator <<(basic_ostream & , _CharT __c) >> [line 206, stl/_ostream.h] >> >>which of course inserts a char. >> >>Intel C++ 6.0 instead calls the same with the /Zc:wchar_t switch, and >> >> (b) _basic_ostream & operator<<(unsigned short __x) >> [line 99] >> >>which outputs numbers without the switch. >>With MSVC's library things are analogous. >> >>The real questions are why VC++ doesn't choose (b) and why Intel C++ >>with the switch calls (a). > >I think the reason may be quite simple. Intel C++ has (the possibility of) >intrinsic wchar_t. Therefore, when it's turned on, it calls the >char-version, and when it's turned off, it becomes a synonym of unsigned >short, like VC++, and therefore calls the other function. > I find it quite horrible. It doesn't have an intrinsic wchar_t, even with the switch. It only acts as if it had in some situations. Which situations is very difficult to say. Also, I expected "microsoft emulation" without the switch and "compliance" with the /Zc:wchar_t (or with some other switch, I don't know what the /Zc:wchar_t does with .NET). Instead, _maybe_ there's no way to make it compliant (of course I've tried invoking it from the command line, after editing the icl.cfg file). Type identity is absolutely fundamental for the language and for program semantics. The fact the Intel with the switch gives true with this typeid(wchar_t) == typeid(unsigned short) means that wchar_t and unsigned short _are_ the same type (and let's forget for a moment that this is non conforming). Also typeid(L'1') == typeid (unsigned short) typeid(L'1') == typeid (wchar_t) both yield true. OTOH, the example about overloading (or what kind of 'over-stuff' is it now) tells that they are different, and also that the 'best match' for the call dummy (L'1') is void dummy (unsigned short)! I'm sorry to be a little harsh but I really spent (wasted) a lot of time trying to rationalize the behaviour of the compiler. I expect such things from MSVC but I was really astonished from Intel, so I wandered through the documentation, the command line and the config files, with a big "there must be a switch" that kept floating in my mind. I didn't even think to compare type_infos because I really trusted the dragon! >VC++, on the other hand, doesn't have intrinsic wchar_t. Therefore, they >have a few options: They could behave as Intel C++, using no intrisic >wchar_t, and call the unsigned short version, too. That would never give any >character behaviour for wchar_t. Or, they could do as apparently is done, >treat "unsigned short" special, and call the char-version for it, getting >some wide character support that way. > >This of course means that you can't output an unsigned short value, without >having it output as a character. In essence, VC++ "hijacks" the unsigned >short type, and uses it as if it was wchar_t, to compensate for compiler >deficiencies. For this reason, lexical_cast works even if it has no >intrinsic wchar_t. Well, fortunately all these things don't affect lexical_cast :-) My question was an aside in the context of this thread (that's why I put it in the post-scriptum). This doesn't prevent all the issue to be quite puzzling, though. >However, that also means unsigned short is taken, so >nobody can use it as another character type, for example. > >>It seems that both compilers treat wchar_t differently from an >>unsigned short in certain situations. > >I think the answer, as explained above, is that Intel C++ treats wchar_t as >a true intrinsic type, while VC++ treats it as a synonym for unsigned short, >and attempts to remedy this by changing the library, as shown above. > >>Try for instance this trivial program (use the switch for icl): >> >>#include <sstream> >> >>int main(int argc, char* argv[]) >>{ >>using namespace std; >> >>basic_stringstream<unsigned short> str; >>str << L'1'; >> >>return 0; >>} > >When I try this on Intel C++, with the switch, I get errors. It compiles on >VC++, as expected Eheheh :-) To me, it didn't compile with VC++, but only because I forgot the Zc:wchar_t switch in the Project Settings dialog (try to believe...) >, as VC++ apparently uses "unsigned short" and "wchar_t" as >aliases, whereas Intel C++, with the switch, does not. Trying it on Intel >C++, without the switch, works there as well, again as expected, as it then >emulates VC++. > >>Done it? Well, now change unsigned short to wchar_t.... > >That works on Intel C++ and on VC++, as expected. For the same reason as the >above. > >>> So it's kind of a partial wchar_t support, >>>there, where they have made "unsigned int" to behave as "wchar_t". > >>You mean unsigned short as a distinct wchar_t type? > >Apparently, as mentioned, they are synonyms, for VC++. Instead, the library >is changed, to let unsigned short work as if it's a character type. > >>Anyhow, I hope Intel didn't ruin a so good product with these kinds of >>microsoft-related conundrums :-( > >No. As you can see, Intel C++ works fine with the switch. It's only without >it that it emulates VC++. > >>What about the example I gave in my previous post and >>typeid(wchar_t).name() on Intel C++ 7.0? > >It still gives "unsigned short". However, they are allowed to do this, as >the result is implementation dependent. Still, it's not particularly >helpful, so I'll mention this in an issue report to Intel. I suggest in particular to point out the very dangerous duality (to be polite) between compile-time identity and run-time identity. And with a switch that is expected to enforce conformance. For instance // use Zc:wchar_t // template <class T> void f (T ) { std::cout << "Is same?" << boost::is_same<unsigned short, T>::value << '\n'; std::cout << "Same typeinfo?" << (typeid(unsigned short) == typeid(T)) << '\n'; } int main() { f (L'1'); } > >>Wow, peer-review really works. :) >> >>Thanks. :) > >>I'm happy that you find all those pedantic comments useful! Ehmm... >>Terje, are you sure? :-) > >Yes. :) And it's not pedantic. Your feedback is most valuable, as it has let >me remove bugs in the version, and possible problems, such as the '\0' / >L'\0' issue. > >This is not pedantic. This is precision. And that's a virtue. By the way, >you've got credit as contributor in the lexical_cast file, as you may have >seen. > Well, I'm a bit embarrassed because I think my comment were marginal. But only because frankly I didn't see anything wrong in the fundamental parts. In any case, thank you very much! :-) Genny.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk