Subject: Re: [boost] Making Boost.Filesystem work with GENERAL filenames with g++ in Windows (a solution)
From: Alf P. Steinbach (alf.p.steinbach+usenet_at_[hidden])
Date: 2011-10-27 09:45:06
On 26.10.2011 22:13, Beman Dawes wrote:
> On Tue, Oct 25, 2011 at 9:41 AM, Alf P. Steinbach
> <alf.p.steinbach+usenet_at_[hidden]> wrote:
>> IMHO access to files is a crucial part of Boost.Filesystem. However, with
>> Boost 1.47, and using g++ 4.4.1 in Windows 7, boost::filesystem::ifstream
>> etc. fail to open or create files with non-ANSI characters. It works fine
>> with Visual C++; it FAILS with g++ 4.4.1, which is the one bundled with the
>> Code::Blocks IDE.
> Yes, although it is actually characters that are not covered by the
> current file system codepage rather than non-ANSI characters, IIRC.
> Surprisingly, no one has opened a ticket yet.
> Until someone does open a ticket and the problem gets fixed, there are
> a couple of workarounds:
> (1) Use V2. Its fstream.hpp uses an implementation hack that works as
> long as 8.3 filenames are enabled.
I think this is good. :-) It's what I, unaware of the history, proposed.
> (Some Windows users disable 8.3
> filenames as an optimization.)
The capability to disable them is there, but I don't think anyone is
actually doing that.
Because: Windows uses 8.3 filenames in the registry, and reportedly the
Microsoft Installer uses and requires them, and so on.
> (2) V3 may work OK with the Microsoft 65001 UTF-8 codepage, although
> I've never used it myself and you would have to pass in a UTF-8
> encoded narrow character name.
I'm not sure exactly what you're thinking of here, but I suspect that
it's due to some technical misunderstanding. Narrow character Windows
paths need to be encoded as ANSI, which is not a specific codepage but
the variation of codepage 1252 specified by the GetACP function. This
codepage is independent of the active codepage in a console; the default
codepage for a console is called the "OEM" codepage.
Changing the ANSI or OEM codepage, the default codepages, can be done
via an undocumented registry key, and rebooting.
However, while I regularly recommend changing the OEM codepage (from 437
to e.g. 1252), changing the ANSI codepage to something non-ANSI could
conceivably wreak a lot of havoc with applications that assume that the
ANSI codepage is like ANSI, a single byte per char encoding.
>> The failure probably has nothing to do with the g++ version: it's due to g++
>> not offering the Visual C++ wchar_t oriented extensions to the standard
>> iostreams (Boost.Filesystem uses these Visual C++ extensions).
> Right. libstdc++ doesn't provide the wchar_t overloads.
>> I stumbled onto this while I was writing about using Unicode in C++
>> programming in Windows.
>> I wrote up a technical solution in section 5, starting on page 16, of that
>> work-in-progress document, available on Google Docs at:
>> Essentially, the fix I ended up with, full source code given in the above
>> doc, uses Windows short file names if (1) there is no wide character support
>> and if furthermore (2) the filename can't be perfectly translated to ANSI.
>> The C++ implementation's support for wide chars is automatically detected
>> using C++98-compatible code.
>> I do not know what to do with this.
> If you care enough to open a ticket on the Boost bug tracker, I'll
> move the V2 code to V3. But there is a big backlog of tickets, so no
> guarantees as to when that will happen.
Thank you, done.
> Another possibility is to try to talk the libstdc++ folks into
> supporting the Dinkumware wchar_t extension. They will presumably want
> to do that anyhow to support TR2 (or whatever it is going to be
Luc Danton, over at SO, pointed me to some earlier discussion of
extending libstdc++ with Unicode path support, in June this year, at
Maybe that can be useful?
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk