Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode)
From: Frédéric Bron (frederic.bron_at_[hidden])
Date: 2017-06-23 10:46:03


>> It looks like std filesystem uses global locale... (Bad idea. )
> Does it? They finally succeeded in breaking it after so many tries? Oh dear.

apparently, with std::filesystem, you can create a path from UTF-8
string with u8path and from a path, you can get its UTF-8
representation with u8string. You do not need a locale for this.

I looked into the current draft standard about ill-formed UTF-16.

The draft standard says in 30.10.8.4.6 path native format observers
[fs.path.native.obs]:
"Remarks: Conversion, if any, is performed as specified by 30.10.8.2.
The encoding of the string returned
by u8string() is always UTF-8."

Here u8string() must return a perfectly valid UTF-8 string (but its
name implies so).

And in 30.10.8.2.1 path argument format conversions [fs.path.fmt.cvt], I read:
"Pathnames are converted as needed between the generic and native
formats in an operating-system-dependent
manner. Let G(n) and N(g) in a mathematical sense be the
implementation’s functions that convert native-
to-generic and generic-to-native formats respectively. If g=G(n) for
some n, then G(N(g))=g; if n=N(g) for
some g, then N(G(n))=n. [ Note: Neither G nor N need be invertible. —
end note ]"

It is funny how it is said: it does not say the roundtrip must work,
it says the roundtrip must work on any converted paths. This is the
case for boost.nowide as once converted to UTF-8, the round trip
narrow->wide->narrow works.

I also read in 30.10.8.2.2 path type and encoding conversions
[fs.path.type.cvt]
"If the encoding being converted to has no representation for source
characters, the resulting converted characters, if any, are
unspecified. Implementations should not modify member function
arguments if already of type path::value_type."

This means that it is implementation defined how to convert an
ill-formed path. Therefore, using the replacement character U+FFFD
would match the std::filesystem policy.

Note also that std::filesystem just uses narrow string as is on Posix
as boost.nowide does.
 char: The encoding is the native narrow encoding (30.10.4.9). The
method of conversion, if any, is
"For POSIX-based operating systems path::value_type is char
so no conversion from char value type arguments or to char value type
return values is performed."

Frédéric

Frédéric


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk