From: Jonathan Biggar (jon_at_[hidden])
Date: 2005-03-18 17:56:53
Rogier van Dalen wrote:
> On Thu, 17 Mar 2005 17:52:25 +0100, Erik Wien <wien_at_[hidden]> wrote:
>>What exactly do mean by the term "character"? Abstract characters?
> I really need to remember the correct terminology - what I mean is the
> thing "a user thinks of as a character", a "grapheme cluster", of
> which the Unicode standard says:
> "[T]here is a core concept of "characters that should be kept
> together" that can be defined for the Unicode Standard in a
> language-independent way. This core concept is known as a grapheme
> cluster, and it consists of any combining character sequence that
> contains only nonspacing combining marks, or any sequence of
> characters that constitutes a Hangul syllable (possibly followed by
> one or more nonspacing marks)."
> I believe this is what a Unicode library should use as its basic unit.
Be careful with making a global assertion. Different users of a Unicode
library will need to access the data at different levels. Some will
need the raw encoding bytes or words, some will need code points, and
some will need 'grapheme clusters'.
The library should support working at the level that each particular
user needs, and different parts of an application or library may need to
work at multiple levels.
-- Jon Biggar Levanta jon_at_[hidden]
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk