Boost logo

Boost :

From: Aaron W. LaFramboise (aaronrabiddog51_at_[hidden])
Date: 2004-10-19 05:37:06


Rogier van Dalen wrote:

> An assumption I think is wrong is that wchar_t would be suitable for
> Unicode. Correct me if I'm wrong, but IIRC wchar_t has 16 bits on
> Microsoft compilers, for example. The utf8_codecvt_facet
> implementation will on these compilers cut off any codepoints over
> 0xFFFF. (U+1D12C will come out as U+D12C.)

This is because the Windows NT ABI is hardwired for 16-bit wide
characters. I beleive that means the wide characters are actually
UTF-16 characters that use "surrogate pairs." Regardless of whether
this is a good thing or not, Windows compilers need to follow suit as
the underlying implementation of their wide characters is in Windows,
not in the compiler.

It might be possible for a compiler to provide their own Unicode
implementation, and map that to Windows' wide characters, but in the
user-visible situations where the two implementations disagreed, there
might be suprising results that might make the compiler-provided
implementation unusable.

Aaron W. LaFramboise


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk