Boost logo

Boost :

From: Gavin Lambert (boost_at_[hidden])
Date: 2019-12-03 22:19:11


On 4/12/2019 05:25, Yakov Galka wrote:
> On Mon, Nov 11, 2019 at 4:19 AM Alexander Grund wrote:
> I raised this issue many years ago. In fact boost filesystem v2 was better
> in this respect, because it followed the established convention of having a
> templated basic_path<char>, thus not committing to a specific char type.
> Alas, v2 was deprecated and v3 was lobbied into WG21 for standardization.
> It was an unprecedented case of introducing a "char on some platforms,
> wchar_t on others" interface into the standard, which is a bad decision
> from portability stand point.

While I agree in principle, the simple fact is that performing string
transcoding on filesystem paths is a Very Bad Ideaâ„¢, since both Windows
and Linux treat them as opaque byte sequences -- but Windows' native
encoding is UTF-16 and Linux' is (mostly) UTF-8.

So, while unfortunate, v3 made the correct choice. Paths have to be
kept in their original encoding between original source (command line,
file, or UI) and file API usage, otherwise you can get weird errors when
transcoding produces a different byte sequence that appears identical
when actually rendered, but doesn't match the filesystem. Transcoding
is only safe when you're going to do something with the string other
than using it in a file API.

> While we are at it, I would like to say that boost filesystem should have
> never introduced a path class in the first place. filesystem::path is just
> a glorified string with no extra invariants. Any string -> path conversion
> copies the data, even if it's already in the right encoding, even on
> operating systems that don't need any conversions at all. There goes your
> "don't pay for what you don't use" principle. Most can agree that C++'s
> spirit is to separate containers from algorithms. A proper design would
> introduce path manipulation functions that work on any string types, and
> let users use std::string or even char[] for storage.

While copying is unfortunate, these things are rarely on a
performance-critical path, and the benefits of having consistent
compose/decompose operations on paths vastly outweighs that, in my
opinion. Combined with the need to maintain native encoding for paths,
separated algorithms don't seem particularly useful -- just less
convenient to use.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk