Boost logo

Boost :

From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2004-04-13 16:45:41


Miro Jurisic <macdev_at_[hidden]> writes:

> In article <87isg3skr6.fsf_at_[hidden]>, Jeremy Maitin-Shepard <jbms_at_[hidden]>
> wrote:

> [snip]

>> - For the purpose of string construction, the Unicode specification
>> explicitly states that any sequence of code points is well formed, and so
>> this provides the smallest unit by which guaranteed-well-formed strings
>> can be formed.

> Can you refer me to a specific point in the spec where this is stated?

In Unicode 4.0.1, Chapter 3.9:

 D30a Well-formed: A Unicode code unit sequence that purports to be in a
      Unicode encoding form is called well-formed if and only if it does
      follow the specification of that Unicode encoding form.

    - A Unicode code unit sequence that consists entirely of a sequence
      of well-formed Unic ode code unit sequences (all of the same
      Unicode encoding form) is itself a well-formed Unicode code unit
      sequence.

Thus, since any code unit sequence representing a single Unicode scalar
value is itself well-formed, any sequence of encoded code points is
well-formed.

> [snip]

-- 
Jeremy Maitin-Shepard

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk