Boost logo

Boost :

From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-10-20 02:53:15


In article <001301c4b635$bda16750$6501a8c0_at_pdimov2>,
 "Peter Dimov" <pdimov_at_[hidden]> wrote:

> My opinion is that the std::char_traits<> experiment failed and conclusively
> demonstrated that the "string as a value" approach is a dead end, and that
> practical string libraries must treat a string as a sequential container,
> vector<char>, vector<char16_t> and vector<char32_t> in our case.
>
> The interpretation of that sequence of integers as a concrete string value
> representation needs to be done by algorithms.

There is no dispute that the rep of the string needs to be a container. (Though
I do not agree that it's obvious that it should be a vector.) However, the
basic_string interface grafted on top of a container of Unicode code units will
produce bogus Unicode strings. This is why I strongly believe that basic_string
is not a suitable container for Unicode strings. A separate container which does
not provide convenient and completely incorrect member functions (such as find
and assign) should be used.

Consider this; pretend that

 - c and d are characters
 - C and D are the same character with an umlaut
 - C and D do not have precomposed code units in Unicode

basic_string<char16_t> s("Cc");
// pretend assign and find use iterator ranges, for simplicity
s.assign(s.find("c"), "d");

This will result in "Dc", which is completely wrong IMNSHO, and there should not
be a simple interface that allows you to shoot yourself in the foot so
thoroughly.

It is not strings-as-containers that I am opposed to, but the deceptive
simplicity of basic_string member functions.

meeroh


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk