Boost logo

Boost :

From: Gennaro Prota (gennaro_prota_at_[hidden])
Date: 2002-07-25 19:28:31


On Thu, 25 Jul 2002 17:57:26 +0200, Terje Slettebø
<tslettebo_at_[hidden]> wrote:

>(Sent as HTML, as it contains a table)
>
>>From: "Gennaro Prota" <gennaro_prota_at_[hidden]>
>
>>On Thu, 25 Jul 2002 06:28:15 +0200, Terje Slettebø
>><tslettebo_at_[hidden]> wrote:
>>
>>>By the way, I wasn't actually sure what to put as string-terminator,
>there.
>>>'\0' might be an alternative, but as this is supposed to work for other
>>>character types, as well (such as wchar_t), as I understand, this would
>mean
>>>that for wchar_t, you would have to have L'\0', instead. Therefore, I
>>>guessed that NULL or 0 would be ok..
>
>>Though NULL is required to expand to something that evaluates to 0 and
>>so would give no error in that context, it is actually intended for
>>pointers
>
>I know. NULL was only used earlier, to kind of signify that this was used as
>a "special value", and because I thought that NULL was required to be a
>macro for int 0. However, I agree that this is an inappropriate place for
>it (it's not a pointer, as you say), and the standard apparently doesn't
>guarantee that it's a macro for int, only a macro for an integral value.

I'm not sure why you make that distinction. In case the standard
guaranteed it to be a macro for int would you use it?

[...]

>This is a problem of generic programming, involving literals with different
>syntax, such as '\0' and L'\0'.
>
>This was brought up in a recent "C Vu" article by Francis Glassborow ("C Vu"
>June 2002, p.12, "Trouble with Literals". "C Vu" is the journal of the
>ACCU). To quote: "We do not have a syntax to deal with the type of a literal
>in a generic way."
>

Hmmm... it isn't online, is it? BTW, are you sure he refers to the
null character and not to generic literals like 'a' and 'x'?

>Even if we use '\0' for char, and L'\0' for wchar_t, how do we generically
>specify the end of the string of an unknown character type?

I think the standard library just considers charT() as the 'null
character' for the type charT. If charT is a POD type (as required for
it to be a character container type - 17.1.3) then charT() means that
it is zero-initialized (8.5/7 and 8.5/5, with the corrections of core
178).

Yesterday, when seeing you original code with 'NULL', my first thought
was in fact whether to replace it with a plain 0, or with 'Target()'.

I never had to deal with this kind of issues, though. And the standard
didn't clarify my ideas. But I'm quite sure that Dietmar Kuehl can
literally illuminate you and me on these questions. I hope he is
reading (though this will make him see my tremendous ignorance on this
area of the library :-))

> There appears to
>be no character trait for it. Perhaps there should be? Is there a portable
>way to deal with endings of strings of arbitrary character types?
>
>I used 0 to circumvent the problem of using '\0', L'\0', or some unknown
>value, to mark the end of the string of arbitrary character type, and I
>hoped it would convert to the appropriate end of string for the character
>type. However, as you say, I haven't found a guarantee for this in the
>standard.
>
>What one could do, here, is to specialise pointer_to_char_base, for char and
>wchar_t, to use the appropriate '\0' and L'\0', respectively, and leave it
>to use 0 (the base template) for unknown character types. Perhaps it's
>safest to do this?
>
>>>Opps, you're right. Will be fixed. This is one that was missed by the unit
>>>tests, simply because there was no test to test a case where it should
>throw
>>>an exception. The test system is made to enable such tests, as well, I
>just
>>>hadn't included those tests. I'll add tests that check that it throws an
>>>exception when expected, as well.
>>>
>>>Isn't it typical? The one thing you don't test for, is where you get a
>bug.
>>>:)
>
>>Yes, but even the test could have apparently "worked" (that's
>>undefined behavior...)
>
>I know what you mean. However, in this case "worked" means it should throw
>an exception, if passed an empty string, so if that didn't happen, the test
>would pick it up.
>
>>>Actually, the code above doesn't throw an exception, if you try to convert
>>>an empty string to a char, which I think it should, as mentioned, so it
>>>should probably be changed to:
>>>
>>>if(arg[0] == 0 || arg[1] != 0)
>>> throw bad_lexical_cast();
>
>>Yes, in the first place I thought you would like converting "" to '\0'
>>and that's why I gave that code. This morning I began to think that
>>empty strings would have better been punished with an exception, but,
>>as expected, my newsreader told me you already did it :-)
>
>:)
>
>Yeah, I think the principle of least surprise may favour an exception, if
>you try to convert an empty string to a character.
>

Agreed :-)

>>>>2) Stupid question of the day: is there any reason why all Source
>>>>function parameters cannot be declared as Source const &?
>>>
>>>It's not a stupid question. :) Well, for some types, it may be more
>>>efficient to pass by value, than by reference. Pass by reference typically
>>>passes the address of the object, so for small types, just passing the
>>>object may be more efficient, as you then avoid the indirection, when
>>>operating on the object.
>
>>Well, I know that. Actually I missed the reference to const in the
>>select_base mechanism!
>
>Ah, I guessed you knew, so I wondered. Then I understand. :)
>
>The boost::call_traits also deals with the "reference to reference" problem.
>
>By the way, the simulated partial specialisation only handles char and
>wchar_t, while the partial specialisation handles any character type. So the
>simulated version can't really replace the other one.
>
>>(BTW, would we need a similar technique for std::max? :-)
>
>Well, std::min and std::max are already defined to take their parameters by
>const reference, what do you mean?

I was alluding to the fact that if one thinks that passing a reference
to certain types like, let's say, short int _may_ be less efficient
than passing a short int itself he could also not be happy with a
generic

    const T& min (const T& a, const T& b);

and be tempted to provide overloads like

    short int min (short int a, short int b);

and similar.

> Except perhaps to fix the problem
>mentioned above.
>
>>This leads to the stupid question of the month: why not Source const?
>
>Well, what would it solve? Perhaps if the output operator is a const member
>function, and lexical_cast is passed a non-const object?
>
>>P.S.: Of course I know the C++ implications. What I'm trying to
>>discover are boost's guidelines about this sorts of things (I'm quite
>>new to this list), so forgive me if I'm asking something that
>>everybody knows here.
>
>I'm quite new here, myself, so no worry. :) You've been good help.
>
>>>By the way, this works correctly on Intel C++ 7.0 pre-beta... Perhaps if
>you
>>>complain about this, you'll get that, too. :) I got that version, after I
>>>reported some ICE when trying to compile BLL. They gave me that, to try
>>>again. It still doesn't work, but other things, such as this, does.
>>>
>>>There are also other things that work on 7.0 pre-beta, such as Loki's
>>>SmartPtr.h, which doesn't work on 6.0.
>
>>Very odd. Another oddities is this: with your (previous) unit test I
>>get no error compiling with VC++6.0, either using it's original
>>standard library or STLport 4.5.3 with SGI iostreams.
>
>I've tested it using the same setup, so that makes sense. :)
>
>>If I use Intel
>>C++ 6.0 instead, all the tests with (unsigned short) wide-characters
>>fail with both libraries.
>
>>So, even if both compilers lack a distinct wchar_t type, VC++ 6.0
>>works well with both libraries and Intel C++ 6.0 with none of them.
>>Any clue?
>
>I agree that it's odd. When working on that version, I spent a lot of time
>building STLPort for the various compilers, to try to get some sensible
>behaviour out of this. What I found is that many implementations have poor
>support for wide characters. I summarize my findings in the table at the
>end, here.
>
>I altered the unit test, now, to try to find out of this, to also output the
>desired result of the conversion (it used to just output the source, and
>target, and not the specified correct target).

Well, I went along the same lines on my own, also checking for the
BOOST_NO_INTRINSIC_WCHAR_T macro. For instance, in order to have a
clear report I had the following prologue in the unit_test function
:-)

    std::cout << "wchar_t is: " << typeid(wchar_t).name() << '\n';
    std::cout << BOOST_COMPILER << '\n';
    std::cout << "BOOST_NO_INTRINSIC_WCHAR_T: ";
    
# ifdef BOOST_NO_INTRINSIC_WCHAR_T
    std::cout << "<defined>\n\n";
# else
    std::cout << "<undefined>\n\n";
# endif

>
>Using this change, the answer became clear. For the following line (the
>parameters are "do_test(correct_target,source,line)"):
>
>test<int,wchar_t>::do_test(1,L'1',__LINE__);
>
>Using Intel C++ 6.0 with intrinsic wchar_t (/Zc:wchar_t option), and debug
>output (note that typeid(Type).name() still reports it as "unsigned short"):

Indeed.

>
>Test - Succeeded (line 270)
>Source type = unsigned short (1)
>Destination type = int (1) (Should have been (1))
>
>Using Intel C++ 6.0, with no intrinsic wchar_t:
>
>Test - Failed (line 270)
>Source type = unsigned short (49)
>Destination type = int (49) (Should have been (1))
>

And this is what one could expect also from VC++, given that wchar_t
is actually unsigned short.

>Using VC++ 6.0 (no intrinsic wchar_t), and debug output:
>
>Test - Succeeded (line 270)
>Source type = unsigned short (49)
>Destination type = int (1) (Should have been (1))
>
>As you can see, Intel C++, with and without intrinsic wchar_t, works as it's
>supposed to. VC++ 6.0, however, behaves weird. It shows L'1' as "49", yet
>when passing it to stringstream, it interprets it as if it's actually
>wchar_t, and writes it as "1".

Very odd! Well, the different behaviour boils down to what overload of

    basic_ostream<_CharT, _Traits>& operator<< ()

gets called. Let's stick with the STLport.

VC++ 6.0 always call

   (a) operator <<(basic_ostream & , _CharT __c)
                                     [line 206, stl/_ostream.h]

which of course inserts a char.

Intel C++ 6.0 instead calls the same with the /Zc:wchar_t switch, and

   (b) _basic_ostream & operator<<(unsigned short __x)
                                                      [line 99]

which outputs numbers without the switch.
With MSVC's library things are analogous.

The real questions are why VC++ doesn't choose (b) and why Intel C++
with the switch calls (a).

It seems that both compilers treat wchar_t differently from an
unsigned short in certain situations.

Try for instance this trivial program (use the switch for icl):

#include <sstream>

int main(int argc, char* argv[])
{
        using namespace std;

        basic_stringstream<unsigned short> str;
        str << L'1';

        return 0;
}

Done it? Well, now change unsigned short to wchar_t....

> So it's kind of a partial wchar_t support,
>there, where they have made "unsigned int" to behave as "wchar_t".

You mean unsigned short as a distinct wchar_t type?

Anyhow, I hope Intel didn't ruin a so good product with these kinds of
microsoft-related conundrums :-(
What about the example I gave in my previous post and
typeid(wchar_t).name() on Intel C++ 7.0?

>
>By the way, when checking this, I also found a bug in the "printer"
>function, with code such as "stream << "..." ", where "stream" could be any
>kind of stream (including using wide characters). Opps. Fixed now. This is
>something that just happened to work, earlier, but which wasn't correct.
>
>Strictly speaking, the test, if it reports failure, is non-conformant, as it
>may use output to both std::cout and std::wcout, for the info of the
>failure, in the same program, which as I understand is not allowed.
>
>However, this is only in case of failure, as it otherwise doesn't write
>anything but the result of the test. Besides, it appears to generally work,
>anyway, and this is just for debugging. It seems it needs stream flushing,
>when switching stream type, to work, at least.
>
>If this would be a problem, the output could be turned off, when run as part
>of the Boost regression tests.
>
>
>Here are the results of running the unit test on the first version uploaded
>(using Boost 1.28):

>
>
>Platform Compiler Library Result Remark
>----------------------------------------------------------------------------------------------------------
>Windows 2000 Intel C++ 6.0 MSVC standard library (Default) 100% Passed Needs /Zc:wchar_t option
>Windows 2000 Intel C++ 6.0 STLPort 4.5.3 100% Passed Needs /Zc:wchar_t option
>Windows 2000 MSVC 6.0 MSVC standard library (Default) 100% Passed No partial specialisation
>Windows 2000 MSVC 6.0 STLPort 4.5.3 100% Passed No partial specialisation
>Windows 2000 BCC 5.5 Rogue Wave 2.1.1 (Default) 100% Passed Bad PS, not used
>Windows 2000 BCC 5.6 (BCB 6) STLPort 4.5.0 (Default) 100% Passed Bad PS, not used
>Windows 2000 gcc 2.95.3 SGI standard library (Default) 100% Passed (*)
>
>(*) No wide character support in library, 45/104 tests not supported. gcc
>2.95.x may also need #define BOOST_NO_STRINGSTREAM, as config.hpp is unable
>to detect it.
>
>The updated version is uploaded.
>
>Wow, peer-review really works. :)
>
>Thanks. :)
>

I'm happy that you find all those pedantic comments useful! Ehmm...
Terje, are you sure? :-)

Genny.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk