Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Anders Dalvander (boost_at_[hidden])
Date: 2011-01-27 17:27:42


On Thu, 27 Jan 2011 12:51, Nevin Liber <nevin_at_[hidden]> wrote:
> I'd like to see this broken up into three discussions:
>
> 1. Immutable strings.

Immutable or not, I don't see a direct use for modification of
individual code-units (e.g. char, wchar_t) in a string. Too many things
can go wrong. Some kind of manipulation of code-points, yes, but not
code-units.

Anyway, code-points are not the end either. Multiple code-points may be
needed to represent a grapheme, using combining characters. And
sometimes a single code-point can represent several graphemes, such as
ligatures.

> 2. utf8 strings.

Although I personally prefer UTF-8 encoded strings, the internal
encoding is more or less irrelevant for an implementation based on rope
or similar non-contiguous data structure. I believe this is what Dean
Michael Berris is suggesting. I think this is especially true if direct
access to individual code-units are prevented.

For an implementation using a contiguous data structure and providing a
constant time c_str member function I'd really want to see some option
to set the internal encoding of strings. Performance-wise it may be
preferred to use UTF-16 internally when using for example Win32 API, if
an extra copy can be avoided.

> 3. Unrealistic pipe dream about replacing std::string.

Replacing std::string will never happen. Deprecating std::string in
favor of std::text/std::unicode/std::xstring may happen in the long run.

Regards,
Anders Dalvander

-- 
WWFSMD?

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk