Boost logo

Boost :

From: Matus Chochlik (chochlik_at_[hidden])
Date: 2008-05-16 09:12:55


On Fri, May 16, 2008 at 2:22 PM, Ulrich Eckhardt <doomster_at_[hidden]> wrote:
> On Friday 16 May 2008 11:48:30 Matus Chochlik wrote:
>> On Fri, May 16, 2008 at 11:22 AM, Jens Seidel <jensseidel_at_[hidden]>
> wrote:
>> > On Fri, May 16, 2008 at 09:37:42AM +0100, Enda Mannion wrote:
>> >> I am trying to use boost filesystem.
>> >>
>> >> Does boost filesystem support wide character.
>> >>
>> >> I am compiling this on linux and on Windows.
>> >
>> > Stupid question: Do you really use the UTF-16 Unicode encoding
>> > on Linux? I now about some classical Asian 16bit encodings but
>> > these days UTF-8 (which is compatible with char*) is used
>> > everywhere on Linux ...
>
> Sorry to chime in here, but UTF-8 is internally represented as char string,
> but that doesn't make it 'compatible' in any way. It's like saying SCSI and
> ATA disks are compatible because they both use 8 bits per byte. Rather,
> strings encoded in UTF-8 or e.g. ISO8859-1 can(!) be represented as char
> strings both, though for both using an unsigned char string is IMHO an even
> better idea.
>
>> I've already made a couple of posts concerning this issue, but
>> I didn't get too many answers, so sorry if I'm missing something
>> really obvious and for repeating myself :-P ..
>>
>> ... but, couldn't be this issue solved by defining a portable
>> equivalent of TCHAR type which is consistently used by
>> WINAPI and the real char type is switched there
>> at compile time by the means or the "UNICODE"
>> PP symbol ?
>
> Hmmm, I personally consider TCHAR just a hack to ease transition from a
> char-based win32 API to a wchar_t-based (and thus Unicode-capable) one. The
> goal is in any way to have full Unicode support, be it via char and UTF-8 or
> wchar_t and UTF-16. However, the problem with wchar_t is that it is only
> UTF-16 on some platforms, the standard doesn't mandate its encoding at all.
> Further, the problem with char is that it can also hold strings with a
> totally different encoding like one of the ISO8859 encodings.

Well, I consider TCHAR a hack myself, but it is a useful one. I've
used the approach when developing a group of quite large applications
and somewhere in the middle of the process it became clear that it's
better to use widechars and UTF than to mess with different encodings.
I'm glad a didn't have to do the replacing of
char->wchar_t, string->wstring, cout->wcout, not mentioning
things like strlen/wcslen and modify all the Winapi
specific wrappers ;-)

There are several other libraries in Boost just to ease the pain of waiting
for new things to become standard and widespread. Boost.Typeof
and BCCL are IMHO good examples. No offense :-)

The implementation of switching between for example LoadLibraryA
and LoadLibraryW by the means of a preprocessor symbol "LoadLibrary"
is really crazy and I'm not suggesting doing that in Boost.
Any implementation in Boost definitelly needs to be better than this.

>
>> TCHAR is wchar_t or char depending on whether UNICODE is or
>> isn't defined. Boost library functions would use this
>> *boost-char-type* (whatever it's name would be)
>> instead of char or wchar_t, where applicable.
>>
>> On Windows this allows to use the same WIN32 "functions"
>> with both character types and allows an application
>> (when coded properly) to be compiled with both character types
>> without the need of messing with the code.
>>
>> I'm sorely missing something like this in the C++ standard or
>> at least in Boost and I think I'm not the only one.
>
> FYI, I don't. When I need Unicode for some string, I use wchar_t (which isn't
> the holy grail though). Then, when I have to interfere with the win32 API, I
> need to either convert it to TCHAR (whatever that currently is) or,
> preferable, use the function version that takes a wchar_t, like
> CreateFileW(). When I do logging, I typically restrict myself to ASCII, so I
> can also use simple char strings. My opinion is that it is better to actually
> define the encoding of a string on a case-by-case basis and do conscious
> conversions instead of relying on a string type (TCHAR) which changes
> meanings depending on a macro and in the char-case even depending on the OS'
> locale.

Well that's something that I really like to avoid whenever possible. I know from
my experience that both approaches are problematic. Conversion is
slowing things down terribly, and it is not easy to decide, when starting
a large project, which character type (and all the related stuff) is the best.

There are many tradeoffs between chars/wchars and I know that
UTF-whatever and wchars have problems of their own and exactly because
of that I like to have the freedom to do the of choice at the time
of deployment of the application.

-- 
________________
::matus_chochlik
>
> Uli
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk