(Sent as HTML, as it contains a table)
>From: "Gennaro Prota"
<gennaro_prota@yahoo.com>
>On Thu, 25 Jul 2002 06:28:15 +0200,
Terje Slettebų
><tslettebo@chello.no> wrote:
>
>>By
the way, I wasn't actually sure what to put as
string-terminator,
there.
>>'\0' might be an alternative, but as
this is supposed to work for other
>>character types, as well (such as
wchar_t), as I understand, this would
mean
>>that for wchar_t, you
would have to have L'\0', instead. Therefore, I
>>guessed that NULL or
0 would be ok..
>Though NULL is required to expand to something that
evaluates to 0 and
>so would give no error in that context, it is actually
intended for
>pointers
I know. NULL was only used earlier, to kind
of signify that this was used as
a "special value", and because I thought
that NULL was required to be a
macro for int 0. However, I agree that this is
an inappropriate place for
it (it's not a pointer, as you say), and the
standard apparently doesn't
guarantee that it's a macro for int, only a macro
for an integral value. So
this is now changed to 0, in the new version.
There's a question if even
this is sufficient, as you come to below,
here.
> (As you know, it only works because the language lacks a
true
>'null' and the macro NULL expands to an expression that has
*integral*
>type) . So a plain 0 is IMHO better style.
I
agree.
> BTW I wandered through the
>standard looking for a
guarantee that 0 converts to L'\0'. Where is
>it?
Where, indeed.
:)
This is a problem of generic programming, involving literals with
different
syntax, such as '\0' and L'\0'.
This was brought up in a
recent "C Vu" article by Francis Glassborow ("C Vu"
June 2002, p.12, "Trouble
with Literals". "C Vu" is the journal of the
ACCU). To quote: "We do not have
a syntax to deal with the type of a literal
in a generic way."
Even if
we use '\0' for char, and L'\0' for wchar_t, how do we generically
specify
the end of the string of an unknown character type? There appears to
be no
character trait for it. Perhaps there should be? Is there a portable
way to
deal with endings of strings of arbitrary character types?
I used 0 to
circumvent the problem of using '\0', L'\0', or some unknown
value, to mark
the end of the string of arbitrary character type, and I
hoped it would
convert to the appropriate end of string for the character
type. However, as
you say, I haven't found a guarantee for this in the
standard.
What
one could do, here, is to specialise pointer_to_char_base, for char
and
wchar_t, to use the appropriate '\0' and L'\0', respectively, and leave
it
to use 0 (the base template) for unknown character types. Perhaps
it's
safest to do this?
>>Opps, you're right. Will be fixed.
This is one that was missed by the unit
>>tests, simply because there
was no test to test a case where it should
throw
>>an exception. The
test system is made to enable such tests, as well, I
just
>>hadn't
included those tests. I'll add tests that check that it throws
an
>>exception when expected, as well.
>>
>>Isn't it
typical? The one thing you don't test for, is where you get
a
bug.
>>:)
>Yes, but even the test could have apparently
"worked" (that's
>undefined behavior...)
I know what you mean.
However, in this case "worked" means it should throw
an exception, if passed
an empty string, so if that didn't happen, the test
would pick it
up.
>>Actually, the code above doesn't throw an exception, if you
try to convert
>>an empty string to a char, which I think it should, as
mentioned, so it
>>should probably be changed
to:
>>
>>if(arg[0] == 0 || arg[1] != 0)
>>
throw bad_lexical_cast();
>Yes, in the first place I thought you would
like converting "" to '\0'
>and that's why I gave that code. This morning
I began to think that
>empty strings would have better been punished with
an exception, but,
>as expected, my newsreader told me you already did it
:-)
:)
Yeah, I think the principle of least surprise may favour an
exception, if
you try to convert an empty string to a
character.
>>>2) Stupid question of the day: is there any reason
why all Source
>>>function parameters cannot be declared as Source
const &?
>>
>>It's not a stupid question. :) Well, for
some types, it may be more
>>efficient to pass by value, than by
reference. Pass by reference typically
>>passes the address of the
object, so for small types, just passing the
>>object may be more
efficient, as you then avoid the indirection, when
>>operating on the
object.
>Well, I know that. Actually I missed the reference to const
in the
>select_base mechanism!
Ah, I guessed you knew, so I
wondered. Then I understand. :)
The boost::call_traits also deals with
the "reference to reference" problem.
By the way, the simulated partial
specialisation only handles char and
wchar_t, while the partial
specialisation handles any character type. So the
simulated version can't
really replace the other one.
>(BTW, would we need a similar technique
for std::max? :-)
Well, std::min and std::max are already defined to take
their parameters by
const reference, what do you mean? Except perhaps to fix
the problem
mentioned above.
>This leads to the stupid question of
the month: why not Source const?
Well, what would it solve? Perhaps if
the output operator is a const member
function, and lexical_cast is passed a
non-const object?
>P.S.: Of course I know the C++ implications. What
I'm trying to
>discover are boost's guidelines about this sorts of things
(I'm quite
>new to this list), so forgive me if I'm asking something
that
>everybody knows here.
I'm quite new here, myself, so no
worry. :) You've been good help.
>>By the way, this works correctly
on Intel C++ 7.0 pre-beta... Perhaps if
you
>>complain about this,
you'll get that, too. :) I got that version, after I
>>reported some
ICE when trying to compile BLL. They gave me that, to try
>>again. It
still doesn't work, but other things, such as this,
does.
>>
>>There are also other things that work on 7.0
pre-beta, such as Loki's
>>SmartPtr.h, which doesn't work on
6.0.
>Very odd. Another oddities is this: with your (previous) unit
test I
>get no error compiling with VC++6.0, either using it's
original
>standard library or STLport 4.5.3 with SGI
iostreams.
I've tested it using the same setup, so that makes sense.
:)
>If I use Intel
>C++ 6.0 instead, all the tests with
(unsigned short) wide-characters
>fail with both libraries.
>So,
even if both compilers lack a distinct wchar_t type, VC++ 6.0
>works well
with both libraries and Intel C++ 6.0 with none of them.
>Any
clue?
I agree that it's odd. When working on that version, I spent a lot
of time
building STLPort for the various compilers, to try to get some
sensible
behaviour out of this. What I found is that many implementations
have poor
support for wide characters. I summarize my findings in the table
at the
end, here.
I altered the unit test, now, to try to find out of
this, to also output the
desired result of the conversion (it used to just
output the source, and
target, and not the specified correct
target).
Using this change, the answer became clear. For the following
line (the
parameters are
"do_test(correct_target,source,line)"):
test<int,wchar_t>::do_test(1,L'1',__LINE__);
Using
Intel C++ 6.0 with intrinsic wchar_t (/Zc:wchar_t option), and debug
output
(note that typeid(Type).name() still reports it as "unsigned
short"):
Test - Succeeded (line 270)
Source
type = unsigned short (1)
Destination type =
int (1) (Should have been (1))
Using Intel C++ 6.0, with no intrinsic
wchar_t:
Test - Failed (line 270)
Source
type = unsigned short (49)
Destination type =
int (49) (Should have been (1))
Using VC++ 6.0 (no intrinsic wchar_t),
and debug output:
Test - Succeeded (line 270)
Source
type = unsigned short (49)
Destination type =
int (1) (Should have been (1))
As you can see, Intel C++, with and
without intrinsic wchar_t, works as it's
supposed to. VC++ 6.0, however,
behaves weird. It shows L'1' as "49", yet
when passing it to stringstream, it
interprets it as if it's actually
wchar_t, and writes it as "1". So it's kind
of a partial wchar_t support,
there, where they have made "unsigned int" to
behave as "wchar_t".
By the way, when checking this, I also found a bug
in the "printer"
function, with code such as "stream << "..." ", where
"stream" could be any
kind of stream (including using wide characters). Opps.
Fixed now. This is
something that just happened to work, earlier, but which
wasn't correct.
Strictly speaking, the test, if it reports failure, is
non-conformant, as it
may use output to both std::cout and std::wcout, for
the info of the
failure, in the same program, which as I understand is not
allowed.
However, this is only in case of failure, as it otherwise
doesn't write
anything but the result of the test. Besides, it appears to
generally work,
anyway, and this is just for debugging. It seems it needs
stream flushing,
when switching stream type, to work, at least.
If
this would be a problem, the output could be turned off, when run as part
of
the Boost regression tests.
Here are the results of running the unit
test on the first version uploaded
(using Boost 1.28):
Platform
Compiler
Library
Result Remark
----------------------------------------------------------------------------------------------------------
Windows 2000 Intel C++
6.0 MSVC standard library (Default) 100%
Passed Needs /Zc:wchar_t
option
Windows 2000 Intel C++
6.0 STLPort
4.5.3
100% Passed Needs /Zc:wchar_t
option
Windows
2000 MSVC 6.0 MSVC
standard library (Default) 100% Passed No
partial specialisation
Windows 2000 MSVC
6.0 STLPort
4.5.3
100% Passed No partial specialisation
Windows 2000 BCC
5.5 Rogue Wave 2.1.1
(Default) 100%
Passed Bad PS, not
used
Windows 2000 BCC 5.6 (BCB
6) STLPort 4.5.0
(Default) 100%
Passed Bad PS, not
used
Windows 2000 gcc
2.95.3 SGI standard library
(Default) 100% Passed (*)
(*) No wide character support
in library, 45/104 tests not supported. gcc
2.95.x may also need #define
BOOST_NO_STRINGSTREAM, as config.hpp is unable
to detect it.
The
updated version is uploaded.
Wow, peer-review really works.
:)
Thanks. :)
Regards,
Terje