Boost logo

Boost :

Subject: Re: [boost] [string] proposal
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-21 16:32:09


On Fri, Jan 21, 2011 at 5:48 PM, Robert Ramey <ramey_at_[hidden]> wrote:
> Matus Chochlik wrote:
>> Using a name like utf8_t or u8string, string_utf8, etc.
>> at least to me (and I've consulted this off the list,
>> with several people) suggests, that UTF-8 is still
>> something special and IMO also sends the message
>> that it is OK to remain forever with the various encodings
>> and std::string as it is today.
>
> rather than viewing std::string as a sequence of character
> encodings, view it as a sequence of bytes along with
> a few extra functions compared to std::vector.  Lot's
> of programs use std::string in this way without depending
> upon any behavior related to character encoding.

Of course, this is what has been during the discussion
referred-to as encoding agnostic usage. But if I use a
string to refer to the same thing on different platforms
(path, url, proper name, etc.) then I would like that
the byte-sequence would be the same, for the following
reason:

Today data are commonly sent over network between
computers with different platforms and even if on one
machine you don't care about which byte sequence
 represents a string of logical characters you have
to worry about it when you send it to another machine
because it might interpret the sequence differently.
To avoid data corruption during this process there
has to be an agreement on a common representation
at some point during the transfer.

In the past this was not such a big deal because
computers were standalone and the transcoding
could be handled manually. But today moving data
around is so prevalent that it becomes unfeasible
to do it explicitly.

>
> now, consider utf8_string as a sequence of character
> encodings which might be implemented in terms of
> std::string.  It's a different thing and should have a different
> thing.

This would mean that if someone uses for example
a class member variable that you intended to be just
a byte sequence as a character sequence he would
have to make a copy.

>
>>We should *IMO* endorse the opposite.
>
> It is not our proper role to endorse or deprecate
> programming practices.  It's a fools errand in any case.
> The best anyone can do is provide alternatives and
> explain why he thinks they are superior.

OK, by "endorsing" I meant here not just talking about
it and convincing people that it is superior without proving it,
(as it become clear to me in the other thread of the debate)
but actually implementing something better as the current
std::string with the properties described above and let the
"market" decide. But in the end you have to believe in what
you are doing.

>
>> My suggestion is the following:
>>
>> Let us create a class called boost::string that will have
>> all the properties that a string handling class in 2011+ A.D.
>
> What happens in 2021 A.D. when it is discovered that
> "they did it wrong".

Then the people who find that out, will do a lot of complaining
about it and eventually they will create something even better.
I'm not as naive as to think that we create a string class which
will be used for the next 500 years :) But if we create something
that will make the life in the next 10-20 years easier, than it will
be worth the effort.

>
>> should have, basically what std::string should have been.
>
> what you (or we, or someone else) thinks string should have been.

Of course I don't think that I alone can come up with
the "uber_string", but this is Boost with all its gurus :)
so if there is a place where a good string class can
be born then it is IMO here.

>
> This idea depends upon a few presumptions which are not true.
> a) that std::string is used only for character encodings.

No, I imagine it to be (partially) backward compatible with
std::string, but also to have Unicode-aware features, so it can
be used as both the byte sequence and the logical-character
sequence.

> b) that someone can know all the things that std::string might be used for
> as it is

I think we can do reasonable assumptions.

> c) that someone now has the knowledge to design a new version of
> std::string which will never need be changed.

I never said anything like this, see above.

>
> Basically, if you're going to make a "new" thing - fine - just
> make sure you give it a new name.

I'm not thinking about it as a completely new thing, more like
future std::string 2.0, an upgrade not a replacement.

BR,

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk