Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2011-01-28 16:31:02

On Sat, Jan 29, 2011 at 5:13 AM, Matus Chochlik <chochlik_at_[hidden]> wrote:
> On Fri, Jan 28, 2011 at 9:46 PM, Dean Michael Berris
> <mikhailberis_at_[hidden]> wrote:
>> On Sat, Jan 29, 2011 at 4:26 AM, Artyom <artyomtnk_at_[hidden]> wrote:
> [snip/]
>>> 6. Encoding is extrinsic to strings
>>>   ?!?!?!
>>>   All the discussion in started because we need UTF-8
>>>   in strings now we are back to the beginning?
>> No, the discussion started because we need a UTF-8 view of data. You
>> missed the point I was making. And you didn't understand the document
>> I wrote.
> Sorry, but no. The discussion started by the proposal that we should
> by default treat std::strings as if they were UTF-8 encoded.
> Artyom should know because he was the one who did the original
> proposal. The whole 'view' idea was brought up only much later.

And the point I was making was that, doing precisely this was the
"wrong" way of doing it. Assuming a default encoding is "unnecessary"
as an encoding is largely a matter of interpretation of data

I was attempting to solve the problem that is std::string. In the
process I'm moving the issue away from the underlying data and moving
it to a matter of interpretation. To do that in a manner that would
make sense as how I see it, that means moving it into a view of the
data that is held in a string. The string would be the data structure,
the view an interpretation of it.

I never precluded that the string can hold UTF-8 encoded data, but
saying that is the default achieves nothing and is ultimately
unnecessary. In the design I've been proposing the point of the matter
is, interpreting data in a given encoding is separate from how the
data is actually stored. Now let's say you have a UTF-8 string
builder, what else would that write in memory aside from UTF-8 encoded
data? It will though still yield a string, which could be interpreted
many different ways -- I just don't see the encoding as something
intrinsic to the string. That means a string can hold UTF-8 encoded
data and I can wrap that in a view for UTF-16 and see that it will not
validate correctly -- unless I wrap the string with a view for UTF-8
first then pass that into a view for UTF-16 and transcoding can happen
on the fly.

Writing algorithms that deal with strings, is different from writing
algorithms that deal with encoded text. That's two different levels.

This explaining, and trying to explain again, the whole point of the
matter makes me sound like a broken record. If you still don't get
what I'm saying then I guess I'm going to have to try a different
route and just show what I mean in terms of code at some point in


Dean Michael Berris

Boost list run by bdawes at, gregod at, cpdaniel at, john at