Boost logo

Boost :

From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2020-01-07 01:58:50


On Tue, Dec 3, 2019 at 2:19 PM Gavin Lambert via Boost <
boost_at_[hidden]> wrote:

> While I agree in principle, the simple fact is that performing string
> transcoding on filesystem paths is a Very Bad Ideaâ„¢, since both Windows
> and Linux treat them as opaque byte sequences -- but Windows' native
> encoding is UTF-16 and Linux' is (mostly) UTF-8.
>

Unix paths can be stored in a narrow string already, where fopen() always
magically worked for any text. Windows paths can be transcoded losslessy
into WTF-8 and back.

So, while unfortunate, v3 made the correct choice. Paths have to be
> kept in their original encoding between original source (command line,
> file, or UI) and file API usage, otherwise you can get weird errors when
> transcoding produces a different byte sequence that appears identical
> when actually rendered, but doesn't match the filesystem. Transcoding
> is only safe when you're going to do something with the string other
> than using it in a file API.
>

See above, malformed UTF-16 can be converted to WTF-8 (a UTF-8 superset)
and back losslessly. The unprecedented introduction of a platform specific
interface into the standard was, still is, and will always be, a horrible
mistake.

> While copying is unfortunate, these things are rarely on a
> performance-critical path, and the benefits of having consistent
> compose/decompose operations on paths vastly outweighs that, in my
> opinion. Combined with the need to maintain native encoding for paths,
> separated algorithms don't seem particularly useful -- just less
> convenient to use.
>

The path parsing and modification functions could be storage agnostic. Some
prefer the x.join(y) syntax over join(x,y), but that's just a preference
originating from the OOP crowd.

-- 
Yakov Galka
http://stannum.co.il/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk