Boost logo

Boost :

From: Gavin Lambert (boost_at_[hidden])
Date: 2022-08-16 03:15:32


On 16/08/2022 11:53, Vinnie Falco wrote:
> My experiences with std::filesystem and boost::filesystem have been
> nothing but negative. I think that the decision to make the character
> type different on Windows was a mistake. The need for locales and
> imbuements and global state and... really, it is just giving me a big
> headache.

Using wchar_t on Windows is actually the least painful option. (And you
don't have to worry about locales and imbuements etc if you never try to
convert to not-wchar_t.)

For correct behaviour, you *must* only use the W variants of the native
API methods, or wchar_t methods of standard library functions.

Inevitably, everything in the standard library that accepts 'char'
params assumes that these are encoded in the ANSI code page, not UTF-8.
This can't be "fixed" or it breaks all the legacy apps.

In practice, this means that unless you can absolutely guarantee that
your paths only contain pure ASCII (and the instant you accept a path or
filename from the user, you lose), it is *never* safe to use any of the
non-wide library methods.

You *can* (and many do) store paths in other libraries and in the
application in 'char'-encoded-as-UTF-8, but then you have to remember
every single time you hit the standard library or direct WinAPI
boundaries to convert your strings to wide before passing them across,
or hilarity will ensue (without even a convenient compiler error).

Storing paths as wchar_t in the first place both avoids the cost of
converting back and forth and potential corruption (often overlooked,
unless you regularly test with unicode paths) from accidentally
forgetting a conversion.

> (where is the signature of fopen that accepts a filesystem::path?)

Why are you using fopen in C++ in the first place?

Filesystem does provide 'path' overloads for fstreams, which you should
have been using instead anyway.

> It should be utf-8 only, use Plain Old char (even on Windows), it should
> be completely portable, except that it requires that directories are
> possible and that the filesystem isn't weird (I don't really care
> about compatibility with grandpa's EPROMs that can hold 9-bit flat
> files).

In theory, the standard library (and other wrapper libraries around the
WinAPI, including Filesystem) could start doing more sane things by
using the C++20 'char8_t'/'u8string' types to disambiguate between UTF-8
encoded paths and legacy idkwtf-'char'-encoded paths. But this will
take a very long time to percolate through the ecosystem, especially as
there are a bunch of people who hate the very idea of it. And it
doesn't solve the conversion performance angle.

(Hopefully, Windows will eventually provide char8_t entrypoints and
APIs, which will make it easier to interoperate with not-Windows.)

Although as Emil has already pointed out, it's valid in not-Windows to
have arbitrary not-UTF-8 byte sequences in paths, so you can get into
trouble in that direction as well.

That's another reason for using wchar_t in Windows and char in
not-Windows: no conversions happen at all (at least where values are
accepted natively from the OS), which has maximal compatibility for
otherwise-invalid byte sequences that nevertheless exist.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk