Boost logo

Boost :

From: Jens Seidel (jensseidel_at_[hidden])
Date: 2008-05-16 08:36:50


On Fri, May 16, 2008 at 11:48:30AM +0200, Matus Chochlik wrote:
> On Fri, May 16, 2008 at 11:22 AM, Jens Seidel <jensseidel_at_[hidden]> wrote:
> > Stupid question: Do you really use the UTF-16 Unicode encoding
> > on Linux? I now about some classical Asian 16bit encodings but
> > these days UTF-8 (which is compatible with char*) is used
> > everywhere on Linux ...
 
> ... but, couldn't be this issue solved by defining a portable
> equivalent of TCHAR type which is consistently used by
> WINAPI and the real char type is switched there
> at compile time by the means or the "UNICODE"
> PP symbol ?

No, I don't think so. First beside the type you also have to
support initialisations and access to the type. As far as I know
(really never used wchar_t) it is:
const char *text = "Hi world" and
const wchar_t *text = L"Hi world"

How do you want to know whether you need "L" if you just have
a new type?

What about functions/methods which do not exist for both types?
You would always have to write #ifdef ... #else ... #end

Can not even UTF-8 data be stored in wchar_t (first byte is
always zero)? I think support both types together with
different encodings in one program is just asking for trouble.
Use a fixed encoding and one of char or wchar_t accross your whole
program and you simplify your code a lot. Together with wrappers
which convert your data e.g. from UTF-16 wchar_t to UTF-8 char
after calling string functions on Win* you may have a slowdown
but also a compatible program.

> TCHAR is wchar_t or char depending on whether UNICODE is or

UNICODE is a very bad name! The size of the type (char, wchar_t) could
depend on the encoding (UTF-8, UTF-16, ...), not the character set
(Unicode)!

> isn't defined. Boost library functions would use this
> *boost-char-type* (whatever it's name would be)
> instead of char or wchar_t, where applicable.
>
> On Windows this allows to use the same WIN32 "functions"
> with both character types and allows an application
> (when coded properly) to be compiled with both character types
> without the need of messing with the code.
>
> I'm sorely missing something like this in the C++ standard or
> at least in Boost and I think I'm not the only one.

Please use instead a proper string class which is
aware of it's encoding and just transfers it on need. This avoids
really any problems and is portable. See e.g. Qt's QString class:
http://doc.trolltech.com/4.4/qstring.html

Jens


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk