Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Alexander Lamaison (awl03_at_[hidden])
Date: 2011-01-18 13:36:12

Next message: Dave Abrahams: "Re: [boost] [General] Always treat std::strings as UTF-8"
Previous message: Artyom: "Re: [boost] [General] Always treat std::strings as UTF-8"
In reply to: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"
Next in thread: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"
Reply: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"
Reply: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"

On Tue, 18 Jan 2011 19:46:41 +0200, Peter Dimov wrote:

> Dave Abrahams wrote:
>> At Tue, 18 Jan 2011 13:27:29 +0200,
>> Peter Dimov wrote:

> There's also the additional consideration of utf8_t's invariant. Does it
> require valid UTF-8? One possible specification of fopen might be:
>
> FILE* fopen( char const* name, char const* mode );
>
> The 'name' argument must be UTF-8 on Unicode-aware platforms and file
> systems such as Windows/NTFS and Mac OS X/HFS+. It can be an arbitrary byte
> sequence on encoding-agnostic platforms and file systems such as Linux and
> Solaris, but UTF-8 is recommended.
>
> On Windows, the UTF-8 sequence may be invalid due to the presence of UTF-16
> surrogates encoded as single code points, but such use is discouraged.

Are you saying this is how it should be or this is how it is? Because, on
Windows, 'name' certainly can't be UTF-8! The implementation takes 'name'
to be in the default local codepage, uses mbstowchar to up-convert it to a
UCS2 wchar_t string and delegates it to _wfopen (or similar - I'm doing
this from memory).

The up-conversion will turn multi-byte UTF-8 chars into gibberish. For
example fopen with 'name' being "×©×œ×•×-Ø³Ù„Ø§Ù…-pease-ÐœÐ¸Ñ€.txt" creates a file
called "Ã—Â©Ã—Å“Ã—â€¢Ã—Â-Ã˜Â³Ã™â€žÃ˜Â§Ã™â€¦-pease-ÃÅ“ÃÂ¸Ã‘â‚¬"

Alex

-- 
Easy SFTP for Windows Explorer (http://www.swish-sftp.org)

Next message: Dave Abrahams: "Re: [boost] [General] Always treat std::strings as UTF-8"
Previous message: Artyom: "Re: [boost] [General] Always treat std::strings as UTF-8"
In reply to: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"
Next in thread: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"
Reply: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"
Reply: Peter Dimov: "Re: [boost] [General] Always treat std::strings as UTF-8"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk