Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-18 20:24:06


On Wed, 19 Jan 2011 00:00:59 +0100
Robert Kawulak <robert.kawulak_at_[hidden]> wrote:

>> From: Artyom
>> Ok let's thing what do you need iterators for? Accessing "characters"
>> if so you are most likely doing something terribly wrong as you
>> ignore the fact that codepoint != character.
>>
>> I would say such iterator is wrong by design unless you develop
>> a Unicode algorithm that relates to code point.
>
> Now wouldn't it be nice if ascii_t (or whatever it's called) and
> utf*_t string classes had 3 kinds of iterators:
> - storage iterator (char, wchar_t etc.),
> - codepoint iterator,
> - character iterator.

The current iterators fall under the storage iterator category, but
code-point iterators are easily possible. Character iterators may
require help from a full-fledged Unicode library (I don't yet know
whether there's a simple way to determine what code-points are
combining ones, I doubt there is), but they should be doable too.

> You could then reuse many existing algorithms to perform operations on
> a level that is sufficient in a given situation [...] I don't know
> Unicode quirks enough to tell how useful this interface would be, but
> it seems interesting.

And intriguing. When I get back to the Unicode string classes, I'll
look into adding such iterators.

-- 
Chad Nelson
Oak Circle Software, Inc.
*
*
*



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk