Subject: [boost] [filesystem3] CP_ACP vs CP_THREAD_ACP
From: Yechezkel Mett (ymett.on.boost_at_[hidden])
Date: 2010-11-25 11:17:17

Whilst moving from filesystem v2 to v3 on Windows I found that the
library uses CP_THREAD_ACP to convert between narrow and wide
filenames. This is wrong, as I will attempt to explain.

Windows uses a number of locales and language settings as described on
the following page:

The relevant ones are Language for non-Unicode programs (System
Locale), Standards and Formats (User Locale) and Thread Locale. The
System Locale defines which code page is used by the system for narrow
chars, and is set by the user system-wide. The User Locale is for
formatting numbers and sorting and is set by the user system-wide. The
Thread Locale is also for formatting numbers and sorting, is
per-thread, and defaults to the User Locale but can be changed by the
program; however it is recommended in a number of places not to touch
it (although the exact reasons seem vague, apparently it can have
surprising effects).

CP_ACP gives the System Locale code page. CP_THREAD_ACP gives the
Thread Locale code page which is rather a strange item considering
that the Thread Locale isn't meant for code pages at all!

"ANSI" (ie narrow char) API functions (including CreateFileA) use the
System Locale code page (CP_ACP).
fopen (VC++9.0) just passes the filename through to the Windows API
and therefore also uses CP_ACP.
fstream (VC++9.0) uses mbstowcs_s, which uses the global C locale (not
the global C++ locale).
Narrow char windows use CP_ACP.

boost.filesystem should therefore either use CP_ACP (to be consistent
with the Windows API, fopen and the GUI) or the global C++ locale (to
be consistent with fstream). (Using the global C locale seems a
strange idea, but since setting the global C++ locale automatically
sets the global C locale as well, that seems a good compromise.)

Using CP_ACP gives something that "just works", it's almost certainly
what is needed. Using the global C++ locale is perhaps theoretically
correct; it can be set (by the user) to match CP_ACP as follows:

std::locale::global(std::locale(str(boost::format(".%||") %
GetACP()).c_str(), LC_CTYPE));

which is likely the correct thing for most programs anyway, and also
allows the user to choose something different should that be
necessary. (Unfortunately std::locale("") doesn't do the right thing -
it takes the user locale for everything, including code page, and as
explained above the user locale has nothing to do with code page.)

In many (perhaps most) cases the user locale and system locale will be
the same, but not always - for example I have the user locale set to
English UK (for dates) but the system locale set to Hebrew (I need to
be able to use Hebrew in non-Unicode programs).

Yechezkel Mett

