Boost :

Date view	Thread view	Subject view	Author view

From: Peter Dimov (pdimov_at_[hidden])
Date: 2004-10-19 18:45:38

Next message: Robert Ramey: "[boost] Re: Any interest in adding unicode support to boost?"
Previous message: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
In reply to: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Erik Wien: "[boost] Re: Re: Any interest in adding unicode support to boost?"
Reply: Erik Wien: "[boost] Re: Re: Any interest in adding unicode support to boost?"
Reply: Vladimir Prus: "[boost] Re: Re: Any interest in adding unicode support to boost?"
Reply: Miro Jurisic: "[boost] Re: Any interest in adding unicode support to boost?"
Reply: John Maddock: "Re: [boost] Re: Any interest in adding unicode support to boost?"

Erik Wien wrote:
> Ultimately I feel that the operation of normalization (which involves
> canonical decomposition) of unicode strings should be hidden from the
> user completely and be performed automatically by the library where
> that is needed. (Like on a call to the == operator.)

It appears that there are two schools of thought when it comes to string
design. One approach treats a string purely as a sequential container of
values. The other tries to represent "string values" as a coherent whole. It
doesn't help that in the simple case where the value_type is char the two
approaches result in mostly identical semantics.

My opinion is that the std::char_traits<> experiment failed and conclusively
demonstrated that the "string as a value" approach is a dead end, and that
practical string libraries must treat a string as a sequential container,
vector<char>, vector<char16_t> and vector<char32_t> in our case.

The interpretation of that sequence of integers as a concrete string value
representation needs to be done by algorithms.

In other words, I believe that string::operator== should always perform the
per-element comparison std::equal( lhs.begin(), lhs.end(), rhs.begin() )
that is specified in the Container requirements table.

If I want to test whether two sequences of char16_t's, interpreted as UTF16
Unicode strings, would represent the same string in a printed form, I should
be given a dedicated function that does just that - or an equivalent.
Similarly, if I want to normalize a sequence of chars that are actually
UTF8, I'd call the appropriate 'normalize' function/algorithm.

But I may be wrong. :-)

Next message: Robert Ramey: "[boost] Re: Any interest in adding unicode support to boost?"
Previous message: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
In reply to: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Erik Wien: "[boost] Re: Re: Any interest in adding unicode support to boost?"
Reply: Erik Wien: "[boost] Re: Re: Any interest in adding unicode support to boost?"
Reply: Vladimir Prus: "[boost] Re: Re: Any interest in adding unicode support to boost?"
Reply: Miro Jurisic: "[boost] Re: Any interest in adding unicode support to boost?"
Reply: John Maddock: "Re: [boost] Re: Any interest in adding unicode support to boost?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk