Boost logo

Boost :

Subject: Re: [boost] RE [filesystem][cygwin] Standard conformance for wide characters
From: David Abrahams (dave_at_[hidden])
Date: 2009-02-07 16:22:59


on Fri Feb 06 2009, frederic.bron-AT-alcan.com wrote:

>> > Any idea how to help newlib get the missing functions?
>>
>> Implement them?
>>
>> Maybe it's too facile an answer, but I can't think of anything else...
>
> Thanks for the advice but I have absolutely no idea of how to use wide
> characters. I have never used them. C++ standard does not say what can
> be done with wide characters. For example how can I read properly a
> file written in UTF8?

UTF8 isn't wide characters; it's unicode stored in 8-bit code units.
See http://unicode.org/faq/utf_bom.html#UTF8, so wstring would really be
an inappropriate way to store UTF8.

Wide characters are actually not characters either; they're "code
units." See http://unicode.org/faq/utf_bom.html#utf16-1 for example.

In general, AFAIK, a wstring is just a container of wchar_ts and C++
doesn't say anything about how those wchar_ts map onto glyphs. Since
wchar_t is only required to be a 16-bit quantity, the most likely
encoding to store in a wstring is UTF16, but it could be anything. So I
don't think any of the wchar_t* c-string functions can possibly be much
more than textual copies of the regular char* c-string functions, but
operating on a different datatype.

HTH,

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk