Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Alexander Lamaison (awl03_at_[hidden])
Date: 2011-01-18 13:36:12


On Tue, 18 Jan 2011 19:46:41 +0200, Peter Dimov wrote:

> Dave Abrahams wrote:
>> At Tue, 18 Jan 2011 13:27:29 +0200,
>> Peter Dimov wrote:

> There's also the additional consideration of utf8_t's invariant. Does it
> require valid UTF-8? One possible specification of fopen might be:
>
> FILE* fopen( char const* name, char const* mode );
>
> The 'name' argument must be UTF-8 on Unicode-aware platforms and file
> systems such as Windows/NTFS and Mac OS X/HFS+. It can be an arbitrary byte
> sequence on encoding-agnostic platforms and file systems such as Linux and
> Solaris, but UTF-8 is recommended.
>
> On Windows, the UTF-8 sequence may be invalid due to the presence of UTF-16
> surrogates encoded as single code points, but such use is discouraged.

Are you saying this is how it should be or this is how it is? Because, on
Windows, 'name' certainly can't be UTF-8! The implementation takes 'name'
to be in the default local codepage, uses mbstowchar to up-convert it to a
UCS2 wchar_t string and delegates it to _wfopen (or similar - I'm doing
this from memory).

The up-conversion will turn multi-byte UTF-8 chars into gibberish. For
example fopen with 'name' being "שלום-سلام-pease-Мир.txt" creates a file
called "שלום-سلام-pease-Мир"

Alex

-- 
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk