From: Matus Chochlik (chochlik_at_[hidden])
Date: 2008-05-16 09:37:15
On Fri, May 16, 2008 at 2:36 PM, Jens Seidel <jensseidel_at_[hidden]> wrote:
> On Fri, May 16, 2008 at 11:48:30AM +0200, Matus Chochlik wrote:
>> On Fri, May 16, 2008 at 11:22 AM, Jens Seidel <jensseidel_at_[hidden]> wrote:
>> > Stupid question: Do you really use the UTF-16 Unicode encoding
>> > on Linux? I now about some classical Asian 16bit encodings but
>> > these days UTF-8 (which is compatible with char*) is used
>> > everywhere on Linux ...
>> ... but, couldn't be this issue solved by defining a portable
>> equivalent of TCHAR type which is consistently used by
>> WINAPI and the real char type is switched there
>> at compile time by the means or the "UNICODE"
>> PP symbol ?
> No, I don't think so. First beside the type you also have to
> support initialisations and access to the type. As far as I know
> (really never used wchar_t) it is:
> const char *text = "Hi world" and
> const wchar_t *text = L"Hi world"
Yeah, this is done with the TEXT("literal") macro in winapi.
if TCHAR = char then TEXT() expands to "literal"
if TCHAR = wchar_t the it expands to L"literal"
To name the macro TEXT was however not the best choice ;)
and we can avoid repeating this mistake.
> How do you want to know whether you need "L" if you just have
> a new type?
> What about functions/methods which do not exist for both types?
> You would always have to write #ifdef ... #else ... #end
Correct .. and I'm suggesting wrapping these routines (mainly
those from cstring) with inline functions doing this.
This way one does not have to do it in the application code.
Instead of having to choose between strlen/wcslen/mbcslen
you would use say "bstrlen".
> Can not even UTF-8 data be stored in wchar_t (first byte is
> always zero)? I think support both types together with
> different encodings in one program is just asking for trouble.
> Use a fixed encoding and one of char or wchar_t accross your whole
> program and you simplify your code a lot. Together with wrappers
> which convert your data e.g. from UTF-16 wchar_t to UTF-8 char
> after calling string functions on Win* you may have a slowdown
> but also a compatible program.
I was not suggesting supporting both character types at once in
the same compiled binary of the application. Instead I would like
to have the opportunity to decide which character type to use
at the time of deployment on a particular hardware platform, OS
and depending on other circumstances.
>> TCHAR is wchar_t or char depending on whether UNICODE is or
> UNICODE is a very bad name! The size of the type (char, wchar_t) could
> depend on the encoding (UTF-8, UTF-16, ...), not the character set
I strongly agree that UNICODE is a bad name and it would be
necessary to apply the Boost conventions for naming
>> isn't defined. Boost library functions would use this
>> *boost-char-type* (whatever it's name would be)
>> instead of char or wchar_t, where applicable.
>> On Windows this allows to use the same WIN32 "functions"
>> with both character types and allows an application
>> (when coded properly) to be compiled with both character types
>> without the need of messing with the code.
>> I'm sorely missing something like this in the C++ standard or
>> at least in Boost and I think I'm not the only one.
> Please use instead a proper string class which is
> aware of it's encoding and just transfers it on need. This avoids
> really any problems and is portable. See e.g. Qt's QString class:
Well, this is exactly what I'm suggesting to do in Boost. Qt has its
uses and it has its problems and there are some applications
where I certainly would like to avoid using Qt.
Why not define a "bstring" instead ;)
I'm a new guy here, but still, I've noticed several posts related
to this mainly from people using libraries that are wrapping
around the WINAPI calls like Boost.Filesystem, Extension, etc.
and found more of them in the archives.
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- ________________ ::matus_chochlik
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk