|
Boost : |
From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-04-13 16:16:43
In article <87isg3skr6.fsf_at_[hidden]>, Jeremy Maitin-Shepard <jbms_at_[hidden]>
wrote:
> Right, it will certainly be necessary to provide a grapheme_cluster_iterator
> (with value_type = the Unicode string type). ICU should help with this.
You are conflating abstract characters (which exist in absence of a graphical
representation) and graphemes (whose existence is dependent upon the graphical
representation), but I believe we are talking about the same thing.
> Nonetheless, it is useful to represent a single code point, for several
> reasons:
I agree; as I mentioned elsewhere, I believe that the Unicode string abstraction
needs to support at least iteration by abstract characters, encoded characters,
and encoding units.
> - For the purpose of string construction, the Unicode specification
> explicitly states that any sequence of code points is well formed, and so
> this provides the smallest unit by which guaranteed-well-formed strings
> can be formed.
Can you refer me to a specific point in the spec where this is stated?
> - It would be useful to provide functions for querying the Unicode
> properties of individual code points, and this code_point type would be
> the only suitable parameter type.
Absolutely.
> I do agree, however, that for almost any output formatting, the
> locale-specific or user-specified fill text/symbols should be specified as
> strings, rather than as individual characters.
Yes.
meeroh
-- If this message helped you, consider buying an item from my wish list: <http://web.meeroh.org/wishlist>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk