Boost logo

Boost :

From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-10-20 02:56:41

In article <cl4bs4$prn$1_at_[hidden]>, "Erik Wien" <wien_at_[hidden]> wrote:

> Peter Dimov wrote:

> > In other words, I believe that string::operator== should always perform
> > the per-element comparison std::equal( lhs.begin(), lhs.end(),
> > rhs.begin() ) that is specified in the Container requirements table.
> >
> > If I want to test whether two sequences of char16_t's, interpreted as
> > UTF16 Unicode strings, would represent the same string in a printed form,
> > I should be given a dedicated function that does just that - or an
> > equivalent. Similarly, if I want to normalize a sequence of chars that are
> > actually UTF8, I'd call the appropriate 'normalize' function/algorithm.
> Though I see where you are coming from, I don't agree with you on that. In
> my opinion a good unicode library should hide as much as possible of the
> complexity of the actual character representation from the user. If we were
> to require the user to know that a direct binary comparison of strings is
> not the same as a actual textual comparison, we loose some of the simplicity
> of the library. Most users that use such a library would not know that the
> character ö can be represented as both 'o¨' and 'ö', and that as a
> consequence of that, calling == on to strings could result in the behaviour
> "ö" != "ö". By removing the need for such knowledge by the user, we reduce
> the learning curve considerably, which is one of the main reasons for
> abstracting this functionality anyway.

I completely agree with Erik on this. For anything except for US english, the
interface of basic_string grafted on top of a sequence of UTF code points
produces wrong results for most people most of the time. Unicode is hard enough
as it is, we don't need to expose it via an interface whose default behavior
violates the principle of least surprise most of the time.


Boost list run by bdawes at, gregod at, cpdaniel at, john at