Subject: Re: [boost] [gsoc] unicode tools and an unicode string type
From: Beman Dawes (bdawes_at_[hidden])
Date: 2009-03-30 08:24:31
On Sun, Mar 29, 2009 at 9:40 PM, Mathias Gaunard
> I plan to submit during the week my proposal for the Summer of Code about
> I plan to provide:
> - iterator adaptors to iterate sequences of code units, code points and
> graphemes, and eventually more, from a sequence in UTF-8, UTF-16, UCS-2 or
What about conversion algorithms to conveniently generate these
sequences in the first place?
> - miscellaneous utilities, such as categorization of code points
> - normalization functions
> - comparisons but not collations
> - substring search algorithms
> - and finally, an unicode string type
>From prior discussions, it seemed to me that there were actually needs
for several unicode string types.
* Specific UTF-8, UTF-16, UTF-*, string classes to be used within an
application, when a particular Unicode string type and internal
representation is the optimal choice.
* A single utf_string that varies its internal representation at
run-time. This is the choice for communication between third parties
where not enough is known about the applications to choose a
particular internal representation, or within an application when the
application must cope with runtime changing needs..
> I am well aware defining yet another new string type is quite controversial,
> but I believe this is quite useful. A dedicated type would be able to
> maintain certain invariants, such as maintaining a special normalization
> Also, I believe it can be possible to come up with a string design that
> allows easy integration with any other existing string type, such as the
> ones from the standard or Qt
While this is an interesting proposal, it appears to me to be several
years worth of work. How would you structure the first summer's work?
Would you aim at breadth (a prototype covering the whole) or depth
(production quality work that concentrates on one aspect)?