From: Matt Austern (austern_at_[hidden])
Date: 2001-09-26 13:21:12
Daryle Walker wrote:
> on 9/26/01 12:04 AM, Jeremy Siek at jsiek_at_[hidden] wrote:
> > On Tue, 25 Sep 2001 williamkempf_at_[hidden] wrote:
> > willia> >
> > willia> > Unfortunately, the fundamental problem with any approach that uses
> > willia> > the standard library wide character facilities is that wchar_t
> > willia> > isn't guaranteed to be large enough to hold a Unicode character.
> > Right, it takes 21 bits to represent Unicode characters. wchar_t under
> > Windows is 16 bits. Under linux it is 32 bits.
> Didn't it use to be 16 bits? When did it change?
A very rough answer is that it changed with Unicode 3.1.
A more complete answer (still incomplete) is that the number of
bits in a Unicode character, as opposed to a particular encoding
of Unicode characters, was never defined. What happened relatively
recently is that characters outside the basic multilingual plane, or
plane 0, have been defined. You can find a description of the three
newish planes (plane 1, plane 2, and plane 14) at
http://www.unicode.org/unicode/reports/tr27/. Each plane contains
2^16 characters. Not all available character positions in these
four planes have been allocated.
Oh, and just to make life slightly more confusing, there are two
related standards here, Unicode and ISO 10646. The two standards
bodies make sure that the two standards are consistent with each
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk