Boost logo

Boost :

From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2007-09-27 06:54:38


James Porter wrote:
> Actually, UTF-32 (equivalently UCS-4) *is* fixed-width (as of the
> Unicode 5.0.0 standard). Page 31 of the standard (chapter 2) says:
>
> "UTF-32 is the simplest Unicode encoding form. Each Unicode code point
> is represented directly by a single 32-bit code unit. Because of this,
> UTF-32 has a one-to-one relationship between encoded character and code
> unit; it is a fixed-width character encoding form."
>
UTF-32 is a fixed-width encoding of Unicode, but Unicode itself is a
"variable-width character set", what with combining characters.

Whether this is the business of a core string layer in C++ is a
different question.

Sebastian Redl


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk