Boost logo

Boost :

From: Daniela Engert (dani_at_[hidden])
Date: 2022-08-16 06:27:16

Am 16.08.2022 um 05:15 schrieb Gavin Lambert via Boost:
> On 16/08/2022 11:53, Vinnie Falco wrote:
>> My experiences with std::filesystem and boost::filesystem have been
>> nothing but negative. I think that the decision to make the character
>> type different on Windows was a mistake. The need for locales and
>> imbuements and global state and... really, it is just giving me a big
>> headache.
> Using wchar_t on Windows is actually the least painful option. (And
> you don't have to worry about locales and imbuements etc if you never
> try to convert to not-wchar_t.)
> For correct behaviour, you *must* only use the W variants of the
> native API methods, or wchar_t methods of standard library functions.
> Inevitably, everything in the standard library that accepts 'char'
> params assumes that these are encoded in the ANSI code page, not
> UTF-8. This can't be "fixed" or it breaks all the legacy apps.
> In practice, this means that unless you can absolutely guarantee that
> your paths only contain pure ASCII (and the instant you accept a path
> or filename from the user, you lose), it is *never* safe to use any of
> the non-wide library methods.
> You *can* (and many do) store paths in other libraries and in the
> application in 'char'-encoded-as-UTF-8, but then you have to remember
> every single time you hit the standard library or direct WinAPI
> boundaries to convert your strings to wide before passing them across,
> or hilarity will ensue (without even a convenient compiler error).
> Storing paths as wchar_t in the first place both avoids the cost of
> converting back and forth and potential corruption (often overlooked,
> unless you regularly test with unicode paths) from accidentally
> forgetting a conversion.
>> (where is the signature of fopen that accepts a filesystem::path?)
> Why are you using fopen in C++ in the first place?
> Filesystem does provide 'path' overloads for fstreams, which you
> should have been using instead anyway.
>> It should be utf-8 only, use Plain Old char (even on Windows), it should
>> be completely portable, except that it requires that directories are
>> possible and that the filesystem isn't weird (I don't really care
>> about compatibility with grandpa's EPROMs that can hold 9-bit flat
>> files).
> In theory, the standard library (and other wrapper libraries around
> the WinAPI, including Filesystem) could start doing more sane things
> by using the C++20 'char8_t'/'u8string' types to disambiguate between
> UTF-8 encoded paths and legacy idkwtf-'char'-encoded paths.  But this
> will take a very long time to percolate through the ecosystem,
> especially as there are a bunch of people who hate the very idea of
> it.  And it doesn't solve the conversion performance angle.
> (Hopefully, Windows will eventually provide char8_t entrypoints and
> APIs, which will make it easier to interoperate with not-Windows.)
> Although as Emil has already pointed out, it's valid in not-Windows to
> have arbitrary not-UTF-8 byte sequences in paths, so you can get into
> trouble in that direction as well.
> That's another reason for using wchar_t in Windows and char in
> not-Windows: no conversions happen at all (at least where values are
> accepted natively from the OS), which has maximal compatibility for
> otherwise-invalid byte sequences that nevertheless exist.

Amen brother, you speak wisely!

I want to add the following to stay sane on Windows: ensure that *both*
the wide and the narrow execution character encoding is Unicode (i.e.
UTF-16 for wchar_t (that's the default) and UTF-8 for char), build with
_UNICODE defined, and link with <activeCodePage
This guarantees consistent semantics throughout the *whole* execution of
the program on reasonably recent versions of Windows. And lastly,
represent paths with std/boost filesystem paths and use APIs that know
how to deal with them *correctly*.

Similar advise applies to POSIX systems. UTF-8 everywhere is just a
recommendation but no guarantee.


Boost list run by bdawes at, gregod at, cpdaniel at, john at