Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Yakov Galka (ybungalobill_at_[hidden])
Date: 2011-08-11 08:12:10


On Thu, Aug 11, 2011 at 14:41, Daniel James <dnljms_at_[hidden]> wrote:

> On 11 August 2011 12:03, Artyom Beilis <artyomtnk_at_[hidden]> wrote:
> >
> > The problem is policy the problem is Boost just can't decide once
> > and forever that std::string is UTF-8...
>
> Even if there was a consensus within boost, that isn't feasible. We
> don't own std::string, so we don't have a say in what it represents.
>

Of course it's feasible. We have the right to say what it represents in the
interface of *our* libraries. If Boost.ProgramOptions, Boost.Locale and
Sqlite did it, surely we can adopt this policy to the rest of the libraries.

There's a lot of existing code which is not based on that assumption -
> we can't just wish it out of existence and boost should be compatible
> with it.
>

Most of existing code working with plain chars is either encoding agnostic
or is already wrong.

As per the design of the proposed library:
It mixes two orthogonal concepts, namely encoding and storage. The two shall
be separate.
I don't like reference counted strings. Passing strings by reference is not
that hard. Moreover, lots of atomic memory-bus locks in a multiprocessor
system degrade performance.
The 'unicode' support (codepoint iteration, etc) is purely algorithmic and
thus shall be independent of the way the data is stored. I wold like to see
something like `codepoints(any_char_iterator_range)` returning a range of
codepoints.

-- 
Yakov

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk