Boost logo

Boost :

From: loufoque (mathias.gaunard_at_[hidden])
Date: 2006-09-15 20:22:20

Since no one has old code for reuse, I will start to write a few usable
tools from scratch.
Note that I am not an Unicode expert nor a C++ guru.
I am just willing to work in that area and hope my code could be useful
to some.

Feel free to comment and give ideas, since I think the design is the
most important thing first, especially for usage with boost, even though
this topic has already been discussed a few times.

string/wstring is not really suited to contain unicode data, since of
limitations of char_traits, the basic_string interface, and the
dependance on locales of the string and wstring types.
I think it is better to consider the string, char[], wstring and
wchar_t[] types to be in the system locales and to use a separate type
for unicode strings.

The aim would then be to provide an abstract unicode string type
independent from C++ locales on the grapheme clusters level, while also
giving access to lower levels.
It would only handle unicode in a generic way at the beginning (no
locales or tailored things).
This string could maintain the characters in a normalized form (which
means potential loss of information about singleton characters) in order
to allow more efficient comparison and searching.

It would use a policy-based design in order to be as generic as possible
and therefore customizable on many levels, allowing to use the data
structure and encoding you need for interfacing with other libraries.

The policy-based design would also provide functionality similar to
flex_string, to explicitly choose whether to use COW or other
optimizations depending on the situation.

There would also be a const versions, following the const_string design.

Just like super_string, the class would bundle algorithms from
string_algo, since it can probably implement them in a more efficient
way than iterating over the grapheme clusters.

Boost list run by bdawes at, gregod at, cpdaniel at, john at