Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Artyom Beilis (artyom.beilis_at_[hidden])
Date: 2017-06-12 17:39:17


On Mon, Jun 12, 2017 at 6:05 PM, Vadim Zeitlin via Boost
<boost_at_[hidden]> wrote:
> On Mon, 12 Jun 2017 17:58:32 +0300 Artyom Beilis via Boost <boost_at_[hidden]> wrote:
>
> AB> By definition: you can't handle file names that can't be represented
> AB> in UTF-8 as there is no valid UTF-8 representation exist.
>
> This is a nice principle to have in theory, but very unfortunate in
> practice because at least under Unix systems such file names do occur in
> the wild (maybe less often now than 10 years ago, when UTF-8 was less
> ubiquitous, but it's still hard to believe that the problem has completely
> disappeared). And there are ways to solve it, e.g. I think glib represents
> such file names using special characters from a PUA and there are other
> possible approaches, even if, admittedly, none of them is perfect.
>

Please note: Under POSIX platforms no conversions are performed
and no UTF-8 validation is done as this is incorrect:

http://cppcms.com/files/nowide/html/index.html#qna

The only case is when Windows Wide API returns/creates
invalid UTF-16 - which can happen only when invalid surrogate
UTF-16 pairs are generated - and they have no valid UTF-8
representation.

On the other hand creating deliberately invalid UTF-8 is very problematic idea.

Regards,
Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk