Boost logo

Boost :

From: Dan Rosen (dan.rosen_at_[hidden])
Date: 2005-03-31 11:53:59


Hi,

I've been stuck for the past month working on a Win32 i18n project
that seems it will never end. I don't have much background in this
area, but I can answer a question or two.

> First is that if you have single path that stores unicode, then
>
> exists(path("foo"))
>
> will perform char -> wchar_t conversion inside path constructor, and that
> conversion might be not exactly the same that OS would have performed. One
> issues is that program might not have initialized global locale with
> locale(""). Another is that conversion performed by OS might be different
> then those of locale("").

So, one thing I know is that Windows 9x and NT-class systems behave
differently in this respect. You're probably aware that NT-class
systems traffic in wchar_t* encoded in UCS-2 internally, and that
9x-class systems deal with char* encoded in the system's ANSI
codepage. Additionally, on Win2k/XP you have the ability to set a
thread's ANSI codepage separately from the system's ANSI codepage. So,
I'm 100% positive about this, but I believe an example of where
locale() will differ from what Windows wants is the following case:

  - The OS is Win2k/XP, which stores strings as UCS-2,
  - The system and thread codepages differ,
  - You initialize a path("foo") requiring a conversion up to UCS-2.

I think in this case, locale() won't give you what you want. I'm no
expert on this, though, so it's worth checking.

> path p("a"), p2(L"b");
> p /= p2; // must do conversion, might not do what's desired

I think this is important to get right. Having path and wpath distinct
from each other, and forcing explicit conversion, seems like exposing
a choice to users in the interface that's entirely orthogonal to
filesystem manipulation. My apologies if this has already been
discussed ad nauseam, but it seems to me like the "do the right thing"
string conversions should be encapsulated in a different library.

> Also I note that there's no conversion from basic_path<char> to
> basic_path<wchar_t> or vice versa, as far as I can say. To recall my argument
> for conversion: say I have a library which exposes paths in the interface,
> should I use path or wpath in it?

What seems to be common practice on Windows is something like this:

  typedef std::basic_string<TCHAR> tstring;

where TCHAR is a macro which expands either to "char" or "wchar_t"
depending on whether _UNICODE is defined. This tends to be clumsy, in
my opinion. I fear the same practice would be adopted for
basic_path<>.

Cheers,
dr


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk