Boost logo

Boost Users :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-07-02 02:36:54

Delfin Rojas wrote:

> It seems most people post here at night PST. I never thought my posting
> would generate so many interesting discussions.

Well.. night PST is evening GMT+3, which explains at least my postings ;-)

> I have been taking a look at the library code and certainly the only thing
> that would need to change is to use a preprocessor define to turn on/off
> wide character strings and everywhere in the code use TChar strings. When
> the code is being compiled for POSIX systems this Unicode define should be
> turned off. In the Windows specific code all the calls to the Windows API
> would need to change from "FunctionCallA" to "FunctionCall" since
> internally the Windows API also works with TChar.

Yes, that would work. But note that you might want to use wide string even
on Linux -- so you get two versions, narrow and wide.

> The caller could also use the TChar idea to have its code talk to the
> library seamlessly.

Yes, that's OK for application, where the decision to use Unicode is global.
But if you write another library which uses the first one. Then it also must
have two variants. This is what bothers me: everything library should be
unicode and non-unicode variant, even if the differences can probably be
hidden somewhere inside implemenetation.

> String constants can also be expressed in TChars
> (_T("my string") in Windows).

If I understand correctly, this expands to L"my string" -- i.e. string
constant. Then I think it's still needed to have portable string->wstring
conversion which repsects the current locale.

> As far as a library that can be passed both single char and double char
> strings it is also a possibility that would play along well with the
> scenario I just described. The library can perform a string_cast<TChar>
> always to make sure the string is converted to the string type being used
> by the library. If the library is compiled to use wide strings internally
> then string_cast<TChar> would convert char strings to wchar_t strings and
> wchar_t strings would remain unchanged. The contrary occurs when Unicode
> define is turned off.

Yes, that's what I find right. The question is whether you ever need two
version of the library. Supposing that conversions are optimized enough, or
that the performance does not matter much (e.g. for boost::path access to
files via OS might cost must more than any conversion), then you can have
just one version of the compiled library. The users don't have to worry
which one to obtain/install/link to.

> However, I feel this interface is not the best since
> it would allow the caller to mix single char strings and double char
> strings and this is not a good practice generally. Converting strings back
> and forth is not a fast process and conversions may not always result in
> what you expect, especially if you are a novice working with encodings.

This is where we disagree. For example, I want to support Unicode on Linux.
All filesystem functions accept char*, so I *have* to do conversion.
Another question is that many other function only return char*, so again I
need conversions. Why can't they be done by boost::path?


    boost::path p(L".......");

    p /= argv[1];
    p /= to_wstring(argv[1]);

I don't really think the latter is better than the former.

- Volodya

> Somebody mentioned Java doesn't have this problem. This is because all
> strings in Java are UTF-16 (wchar_t) strings.
> Let me know what you guys think of all this.
> Thanks
> -delfin
> -----Original Message-----
> From: boost-users-bounces_at_[hidden]
> [mailto:boost-users-bounces_at_[hidden]] On Behalf Of David Abrahams
> Sent: Thursday, July 01, 2004 9:46 AM
> To: boost-users_at_[hidden]
> Subject: [Boost-users] Re: Feature request for boost::filesystem
> Vladimir Prus <ghost_at_[hidden]> writes:
>> David Abrahams wrote:
>>>> 1. Make the library interface templated.
>>>> 2. Use narrow classes: e.g. string
>>>> 3. Use wide classes: e.g. wstring
>>>> 4. Have some class which works with ascii and unicode.
>>>> The first approach is bad for code size reasons.
>>> It doesn't have to be. There can be a library object with explicit
>>> instantiations of the wide and narrow classes.
>> Which doubles the size of shared library itself.
> It depends; the narrow specialization might be implemented in terms
> of the wide one ;-)

Boost-users list run by williamkempf at, kalb at, bjorn.karlsson at, gregod at, wekempf at