Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-28 04:39:32

On Thu, Jan 27, 2011 at 10:57 PM, Patrick Horgan <phorgan1_at_[hidden]> wrote:
> On 01/27/2011 04:45 AM, Matus Chochlik wrote:
>> ... elision by patrick ...
>> In general? Nothing. I do not have (nor did I have in the past)
>> anything against a general efficient encoding-agnostic string
>> if it is called general_string. But std::string IMO is and always
>> has been primarily about handling text. I certainly do not know
>> anyone who would store a MPEG inside std::string.
> You may think it strange, but there's a lot of code out there that uses
> std::string as a binary buffer.

Your're right, just because I don't use it that way does not mean that
it cannot be done, that is why I said that I'm OK if we call it 'text' instead
of string in one of my previous posts.

>> Usability. It is usually more difficult to use the super-generic
>> everything-
>> solving things. I again for probably the 10-th time repeat that I'm not
>> against
>> such string in general but this is not std::string.
> And neither would a string that enforced utf-8 encoding be std::string.  We
> already have one in the spec, and it's not that.

Yes, also see above. But the main reason why I strongly oppose
any mentioning of 'utf8' in the name of the general-text-handling-class
is basically the same as why I would oppose the
general-floating-point-hanling-classes in C++ to be called
'IEEE_754_float' and 'IEEE_754_double' instead of just plain
'float' and 'double'.

I (and many others around here) have dealt with various text
encodings and all those problems they cause in "non-ascii"
environments, so many times, that my blood pressure skyrockets :)
every time I hear that term.
And I do not want to be reminded about it every time when dealing
with text. Let us mention the encoding only when necessary.

> No.  You're not trying to solve the same problem at all!  (And neither of
> you are trying to deal with std::string.)
> You, Dean, are trying to solve an efficiency problem caused by mutable
> strings, and note that an external view can interpret as any encoding
> desired.  You correctly point out that this is more general and flexible,
> that it has a power that can be applied to many things while giving you all
> the efficiency advantages of immutable data types.  (Although why a general
> buffer for immutable data would be called string which is normally
> associated with text _is_ a bit confusing.  I suspect you've gone down a
> road you never intended trying to make this point.)
> You, Matus, are trying to solve a problem caused by a plethora of possible
> encodings and the extra work that has to be done every time you have to deal
> with them, by specifying that a string will have an encoding type associated
> with it, (and in particular utf-8 as the natural default), and that the
> specialized string itself will enforce the encoding as well as provide ways
> to convert other encodings to it.  (And I think the natural way to do this
> is with code conversion facets.)  You correctly point out that this
> specificity allows a power in solving this one particular problem that a
> more general solution wouldn't be able to match.  A general string with a
> view into it would allow you to get invalidly encoded data into it (N.B for
> an immutable string _into it_ would have a different meaning) and you would
> only know about this after the fact.
> These are both great things.  Kudos to you both.  You're both right.  You
> guys keep arguing apples and orangutans and it makes it hard for others to
> talk about either one of your ideas because you're so busy going back and
> forth telling each other that the other doesn't get what they're trying to
> say.

Believe me, Patrick, I have had the exactly the same feeling (about the
apples and orangutans) the whole time I've participated in the immutable
vs. unicode string discussion. I know that Dean tries to focus
on performance and does not care about encodings and I do care
about performance just not so much Dean, does.

The reason why I kept participating in this 'bike-shed-quarrel' is that
I would hate to see the outcome to be 1 just-another-super-efficient-string
and 1 just-another-unicode-string. There are plenty of those already.

I would like to see the *text* handling in C++ to be addressed
*in the standard* not only on the byte-sequence-level, but on
the code-point/character/word/etc. level.

> I wish you'd split into threads like [immutable string] and [unicode
> string].

I start to like the idea of immutability and if it indeed has
so many advantages I don't see why the text class could
not be build on the immutable_string class.



Boost list run by bdawes at, gregod at, cpdaniel at, john at