Boost logo

Boost :

Subject: Re: [boost] [filesystem] path does not use global locale's codecvt facet - bug or feature
From: Beman Dawes (bdawes_at_[hidden])
Date: 2011-03-03 13:29:47


On Thu, Mar 3, 2011 at 8:31 AM, Artyom <artyomtnk_at_[hidden]> wrote:
> Hello,
>
> Boost.Filesystem v3 uses wide path under windows and can convert it from the
> narrow
> one using codecvt facet, so I would expect if the global locale is some locale
> that has special codecvt facet installed boost.filesystem should use it, i.e.:
>
> int main()
> {
>   boost::locale::generator locale_generator;
>   std::locale::global(locale_generator("en_US.UTF-8"));
>   // Now default codecvt facet is UTF-8 one.
>   boost::filesystem::path p("שלום.txt");
>   boost::filesystem::ofstream test(p);
> }
>
> However this does not work as expected!
>
> I had found that you need to imbue locale explicitly:
>
>
>   boost::filesystem::path p;
>   p.imbue(std::locale()); // global one
>   p = "שלום.txt";
>   boost::filesystem::ofstream test(p);
>
> Now it works.
>
> Should I open a ticket for this or this is "planned"
> behavior?

That depends. The docs recently (Feb 20, rev 69073) got updated to
provide more detail. For Windows, including Cygwin and MinGW, this is
part of what the docs say:

"The default imbued locale provides a codecvt facet that invokes
Windows MultiByteToWideChar or WideCharToMultiByte API's with a
codepage of CP_THREAD_ACP if Windows AreFileApisANSI()is true,
otherwise codepage CP_OEMCP. [Rationale: this is the current behavior
of C and C++ programs that perform file operations using narrow
character string to identify paths. Changing this in the Filesystem
library would be too surprising, particularly where user input is
involved. -- end rationale]"

So your original code won't do what you want. It would only work if
the codepage was a UTF-8 codepage. Most likely it wasn't, and that's
why you needed to do an explicit imbue.

It would help to know what compiler, including the version number.
Most VC++ and recent gcc or MinGW should be OK.

You also have to be sure "שלום.txt" is handled as UTF-8 by your editor
and your compiler, and not converted to some other encoding. I'm
guessing that's not a problem as your imbue change wouldn't have
worked correctly otherwise.

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk