Boost logo

Boost :

From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-04-13 16:16:43


In article <87isg3skr6.fsf_at_[hidden]>, Jeremy Maitin-Shepard <jbms_at_[hidden]>
wrote:

> Right, it will certainly be necessary to provide a grapheme_cluster_iterator
> (with value_type = the Unicode string type). ICU should help with this.

You are conflating abstract characters (which exist in absence of a graphical
representation) and graphemes (whose existence is dependent upon the graphical
representation), but I believe we are talking about the same thing.

> Nonetheless, it is useful to represent a single code point, for several
> reasons:

I agree; as I mentioned elsewhere, I believe that the Unicode string abstraction
needs to support at least iteration by abstract characters, encoded characters,
and encoding units.

> - For the purpose of string construction, the Unicode specification
> explicitly states that any sequence of code points is well formed, and so
> this provides the smallest unit by which guaranteed-well-formed strings
> can be formed.

Can you refer me to a specific point in the spec where this is stated?

> - It would be useful to provide functions for querying the Unicode
> properties of individual code points, and this code_point type would be
> the only suitable parameter type.

Absolutely.

> I do agree, however, that for almost any output formatting, the
> locale-specific or user-specified fill text/symbols should be specified as
> strings, rather than as individual characters.

Yes.

meeroh

-- 
If this message helped you, consider buying an item
from my wish list: <http://web.meeroh.org/wishlist>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk