Boost logo

Boost :

Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-20 03:59:51


On Wed, Jan 19, 2011 at 8:50 PM, Chad Nelson
<chad.thecomfychair_at_[hidden]> wrote:
>
> Do you see another way to provide those conversions, and automatic
> verification of proper UTF coding? (Automatic verification is a very
> good thing, without it someone won't use it or will forget to, and open
> up their programs to exploitation.)

Yes, implementing it into std::string in some future standard.

>
> If Boost comes out with a version that breaks existing programs,
> companies just won't upgrade to it. I can keep one of the companies
> that mine works with upgrading, because the group that I work with is
> the only one there using C++ and they listen to me, but most companies
> have a lot more invested in the existing system. Believe me, any
> breaking changes have to be eased in over many versions -- the "boiling
> a frog" approach. :-)

Of course this is a valid point and what we should do is to do some
potential damage evaluation. There have been breaking changes
in Boost and the end-users finally accepted them (even if complaining
loudly) Boost is a cutting edge library and such changes should
be avoided if possible, but they should not be avoided completelly.
This would require a lot of PR and announcing the changes well
in advance.

>
> If they're already using UTF-8 strings, then we provide something like
> BOOST_ALL_STD_STRINGS_ARE_UTF8 that they can define. The utf*_t classes
> configure themselves to accept std::strings as UTF-8-encoded, and any
> changes are completely transparent to those people. No punishment
> involved.

OK this could work.

>
> For everyone else, we introduce the utf*_t API alongside the
> std::string one, for those classes and functions that are not
> encoding-agnostic. The std::string one can be deprecated in future
> versions if the library author desires. Again, no punishment involved.
>
>
> I don't expect that the utf*_t classes will make it into the standard.
> They definitely won't make it into the now-misnamed C++0x standard, and
> it'll likely be another ten years before another one is hashed out --
> by then, the UTF-8 conversion should be complete, so there will be no
> need for it, except possibly to confirm that a string isn't malformed.
>
>>
>> Besides the ugly name and that is a new class ? No :)
>
> If you can think of a more-acceptable-but-still-descriptive name for
> it, I'm all ears. :-)

I have an idea: what about boost::string, which could possibly become
the next std::string in the future.

>> And the solution is long overdue. And creating utf8_t is just putting
>> the problem away, not solving it really.
>
> I see it as merely easing the transition.

OK, if the long term plan is:

1) design and implement boost::string using UTF-8 doing all the things
like code-point iteration, character iteration, convenience stuff like
starts-with, ends-with, replace, trim, etc., etc. with as much backward
compatibility with std::string as possible without hindering progress

2) try really hard to push it to the standard

then I'm on board with that.

BR,

Matus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk