Boost :

Date view	Thread view	Subject view	Author view

From: John Maddock (john_at_[hidden])
Date: 2004-10-20 05:19:17

Next message: John Maddock: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Previous message: John Maddock: "Re: [boost] Re: Any interest in adding unicode support to boost?"
In reply to: Peter Dimov: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"

> Erik Wien wrote:
>> Ultimately I feel that the operation of normalization (which involves
>> canonical decomposition) of unicode strings should be hidden from the
>> user completely and be performed automatically by the library where
>> that is needed. (Like on a call to the == operator.)
>
> It appears that there are two schools of thought when it comes to string
> design. One approach treats a string purely as a sequential container of
> values. The other tries to represent "string values" as a coherent whole.
> It doesn't help that in the simple case where the value_type is char the
> two approaches result in mostly identical semantics.
>
> My opinion is that the std::char_traits<> experiment failed and
> conclusively demonstrated that the "string as a value" approach is a dead
> end, and that practical string libraries must treat a string as a
> sequential container, vector<char>, vector<char16_t> and vector<char32_t>
> in our case.
>
> The interpretation of that sequence of integers as a concrete string value
> representation needs to be done by algorithms.
>
> In other words, I believe that string::operator== should always perform
> the per-element comparison std::equal( lhs.begin(), lhs.end(),
> rhs.begin() ) that is specified in the Container requirements table.
>
> If I want to test whether two sequences of char16_t's, interpreted as
> UTF16 Unicode strings, would represent the same string in a printed form,
> I should be given a dedicated function that does just that - or an
> equivalent. Similarly, if I want to normalize a sequence of chars that are
> actually UTF8, I'd call the appropriate 'normalize' function/algorithm.

Right, and there are several different Normalised forms so we have to be
able to choose the algorithm that does the right thing for what we want
here.

Can I make one other plea here: *please* lets not get too stuck on string
class representations; we can have iterator sequences as well (these may
well be part of a string, or they may be part of a memory mapped file, or
some other smart iterator - like the Unicode encoding transformation
iterators I've just been writing), and operations / algorithms on iterators
are more important too me than YASC (Yet Another String Class) :-)

John.

Next message: John Maddock: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Previous message: John Maddock: "Re: [boost] Re: Any interest in adding unicode support to boost?"
In reply to: Peter Dimov: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk