Boost logo

Boost :

From: John Maddock (john_at_[hidden])
Date: 2004-10-20 05:19:17


> Erik Wien wrote:
>> Ultimately I feel that the operation of normalization (which involves
>> canonical decomposition) of unicode strings should be hidden from the
>> user completely and be performed automatically by the library where
>> that is needed. (Like on a call to the == operator.)
>
> It appears that there are two schools of thought when it comes to string
> design. One approach treats a string purely as a sequential container of
> values. The other tries to represent "string values" as a coherent whole.
> It doesn't help that in the simple case where the value_type is char the
> two approaches result in mostly identical semantics.
>
> My opinion is that the std::char_traits<> experiment failed and
> conclusively demonstrated that the "string as a value" approach is a dead
> end, and that practical string libraries must treat a string as a
> sequential container, vector<char>, vector<char16_t> and vector<char32_t>
> in our case.
>
> The interpretation of that sequence of integers as a concrete string value
> representation needs to be done by algorithms.
>
> In other words, I believe that string::operator== should always perform
> the per-element comparison std::equal( lhs.begin(), lhs.end(),
> rhs.begin() ) that is specified in the Container requirements table.
>
> If I want to test whether two sequences of char16_t's, interpreted as
> UTF16 Unicode strings, would represent the same string in a printed form,
> I should be given a dedicated function that does just that - or an
> equivalent. Similarly, if I want to normalize a sequence of chars that are
> actually UTF8, I'd call the appropriate 'normalize' function/algorithm.

Right, and there are several different Normalised forms so we have to be
able to choose the algorithm that does the right thing for what we want
here.

Can I make one other plea here: *please* lets not get too stuck on string
class representations; we can have iterator sequences as well (these may
well be part of a string, or they may be part of a memory mapped file, or
some other smart iterator - like the Unicode encoding transformation
iterators I've just been writing), and operations / algorithms on iterators
are more important too me than YASC (Yet Another String Class) :-)

John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk