Boost logo

Boost :

Subject: Re: [boost] [review] Review of Nowide (Unicode) starts today
From: Artyom Beilis (artyom.beilis_at_[hidden])
Date: 2017-06-16 12:20:10


On Fri, Jun 16, 2017 at 12:42 PM, Frédéric Bron <frederic.bron_at_[hidden]> wrote:
>> And you can safely concatenate two different strings and valid file will be
>> created. Even if dir is in ISO 8859-1 and file in UTF-8. The file will be
>> valid even if not representable in any encoding.
>
> but it seems to me that in this case, what I need is UTF-8->ISO 8859-1
> conversion of the file name before concatenation with the directory.
> Otherwise, OK I will get a file because the system just ask for narrow
> string but its name will be wrong in the OS user interface.
>
> Frédéric

You actually **assume** that the encoding you received (like getenv)
from the system actually matches current locale encoding.

But it is not necessary the same:

1. The file/directory was created by user running in different locale
2. The locale isn't defined properly or was modified
3. You get these files directories from some other location (like
unzipped some stuff)

In reality the OS does not care about encoding (most of the time).

Unlike Windows where wchar_t also defines the encoding UTF-16 under POSIX
platforms "char *" can contain whatever encoding and it can be changed.

Also UTF-8 is the most common encoding on all modern Unix like
systems: Linux, BSD, Mac OS X

So I don't think it is necessary to perform any conversions between UTF-8
and whatever "char *" encoding you get because:

(a) You can't reliable know what kind of encoding you use.
(b) Same "char *" may contain parts from different encoding and
actually be valid path.

Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk