From: Jeremy Maitin-Shepard (jbms_at_[hidden])
Date: 2004-04-13 16:45:41
Miro Jurisic <macdev_at_[hidden]> writes:
> In article <87isg3skr6.fsf_at_[hidden]>, Jeremy Maitin-Shepard <jbms_at_[hidden]>
>> - For the purpose of string construction, the Unicode specification
>> explicitly states that any sequence of code points is well formed, and so
>> this provides the smallest unit by which guaranteed-well-formed strings
>> can be formed.
> Can you refer me to a specific point in the spec where this is stated?
In Unicode 4.0.1, Chapter 3.9:
D30a Well-formed: A Unicode code unit sequence that purports to be in a
Unicode encoding form is called well-formed if and only if it does
follow the specification of that Unicode encoding form.
- A Unicode code unit sequence that consists entirely of a sequence
of well-formed Unic ode code unit sequences (all of the same
Unicode encoding form) is itself a well-formed Unicode code unit
Thus, since any code unit sequence representing a single Unicode scalar
value is itself well-formed, any sequence of encoded code points is
-- Jeremy Maitin-Shepard
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk