Boost logo

Boost :

From: Ulrich Eckhardt (doomster_at_[hidden])
Date: 2008-05-16 08:22:12


On Friday 16 May 2008 11:48:30 Matus Chochlik wrote:
> On Fri, May 16, 2008 at 11:22 AM, Jens Seidel <jensseidel_at_[hidden]>
wrote:
> > On Fri, May 16, 2008 at 09:37:42AM +0100, Enda Mannion wrote:
> >> I am trying to use boost filesystem.
> >>
> >> Does boost filesystem support wide character.
> >>
> >> I am compiling this on linux and on Windows.
> >
> > Stupid question: Do you really use the UTF-16 Unicode encoding
> > on Linux? I now about some classical Asian 16bit encodings but
> > these days UTF-8 (which is compatible with char*) is used
> > everywhere on Linux ...

Sorry to chime in here, but UTF-8 is internally represented as char string,
but that doesn't make it 'compatible' in any way. It's like saying SCSI and
ATA disks are compatible because they both use 8 bits per byte. Rather,
strings encoded in UTF-8 or e.g. ISO8859-1 can(!) be represented as char
strings both, though for both using an unsigned char string is IMHO an even
better idea.

> I've already made a couple of posts concerning this issue, but
> I didn't get too many answers, so sorry if I'm missing something
> really obvious and for repeating myself :-P ..
>
> ... but, couldn't be this issue solved by defining a portable
> equivalent of TCHAR type which is consistently used by
> WINAPI and the real char type is switched there
> at compile time by the means or the "UNICODE"
> PP symbol ?

Hmmm, I personally consider TCHAR just a hack to ease transition from a
char-based win32 API to a wchar_t-based (and thus Unicode-capable) one. The
goal is in any way to have full Unicode support, be it via char and UTF-8 or
wchar_t and UTF-16. However, the problem with wchar_t is that it is only
UTF-16 on some platforms, the standard doesn't mandate its encoding at all.
Further, the problem with char is that it can also hold strings with a
totally different encoding like one of the ISO8859 encodings.

> TCHAR is wchar_t or char depending on whether UNICODE is or
> isn't defined. Boost library functions would use this
> *boost-char-type* (whatever it's name would be)
> instead of char or wchar_t, where applicable.
>
> On Windows this allows to use the same WIN32 "functions"
> with both character types and allows an application
> (when coded properly) to be compiled with both character types
> without the need of messing with the code.
>
> I'm sorely missing something like this in the C++ standard or
> at least in Boost and I think I'm not the only one.

FYI, I don't. When I need Unicode for some string, I use wchar_t (which isn't
the holy grail though). Then, when I have to interfere with the win32 API, I
need to either convert it to TCHAR (whatever that currently is) or,
preferable, use the function version that takes a wchar_t, like
CreateFileW(). When I do logging, I typically restrict myself to ASCII, so I
can also use simple char strings. My opinion is that it is better to actually
define the encoding of a string on a case-by-case basis and do conscious
conversions instead of relying on a string type (TCHAR) which changes
meanings depending on a macro and in the char-case even depending on the OS'
locale.

Uli


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk