Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-18 20:24:06

On Wed, 19 Jan 2011 00:00:59 +0100
Robert Kawulak <robert.kawulak_at_[hidden]> wrote:

>> From: Artyom
>> Ok let's thing what do you need iterators for? Accessing "characters"
>> if so you are most likely doing something terribly wrong as you
>> ignore the fact that codepoint != character.
>> I would say such iterator is wrong by design unless you develop
>> a Unicode algorithm that relates to code point.
> Now wouldn't it be nice if ascii_t (or whatever it's called) and
> utf*_t string classes had 3 kinds of iterators:
> - storage iterator (char, wchar_t etc.),
> - codepoint iterator,
> - character iterator.

The current iterators fall under the storage iterator category, but
code-point iterators are easily possible. Character iterators may
require help from a full-fledged Unicode library (I don't yet know
whether there's a simple way to determine what code-points are
combining ones, I doubt there is), but they should be doable too.

> You could then reuse many existing algorithms to perform operations on
> a level that is sufficient in a given situation [...] I don't know
> Unicode quirks enough to tell how useful this interface would be, but
> it seems interesting.

And intriguing. When I get back to the Unicode string classes, I'll
look into adding such iterators.

Chad Nelson
Oak Circle Software, Inc.

Boost list run by bdawes at, gregod at, cpdaniel at, john at