Boost logo

Boost :

From: Peter Dimov (pdimov_at_[hidden])
Date: 2020-01-08 03:37:13


Gavin Lambert wrote:
> On 8/01/2020 14:43, Peter Dimov wrote:
> > Yes, concatenating two character sequences can result in technically
> > invalid WTF-8. But that's not an issue unique to Windows. You can do the
> > same on any non-Windows platform. It's still not clear how this prevents
> > a `path` class from storing ~WTF-8 on Windows, or exposing a char-based
> > API that ~WTF-8 decodes when passing to Windows, and encodes on the
> > reverse trip.
>
> It could. And if you're only round-tripping it to file APIs and doing
> nothing else, then you can probably get away with that.
>
> But there's probably other code that wants to do manipulation on the path
> (swapping extensions, passing to some UI, truncating the filename to 10
> characters, etc). Now there's more parts of the system that needs to know
> you have data in not-legal-WTF-8 format, and how to deal with that.

No, there aren't any (new) problems with that. That is, there aren't
problems you wouldn't have otherwise, on other platforms. Vanilla POSIX can
have any NTBS at all as a path/file name; macOS has UTF-8 NFD paths/file
names. Any code you have that tries to truncate the filename to 10
characters (for whatever definition of character) is already broken. This is
simply not an operation that can be done portably on a path or file name.
(And any code that assumes that a file name will roundtrip, or that two
different file names can't refer to the same file/directory entry, is also
broken.)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk