On Mon, Oct 28, 2019 at 3:35 AM David Demelier via Boost-users <boost-users@lists.boost.org> wrote:

Le 26/10/2019 à 03:11, Zach Laine via Boost-users a écrit :
> About 14 months ago I posted the same thing. There was significant work
> that needed to be done to Boost.Text (the proposed library), and I was a
> bit burned out.
>
> Now I've managed to make the necessary changes, and I feel the library
> is ready for review, if there is interest.
>
> This library, in part, is something I want to standardize.
>
> It started as a better string library for namespace "std2", with minimal
> Unicode support. Though "std2" will almost certainly never happen now,
> those string types are still in there, and the library has grown to also
> include all the Unicode features most users will ever need.
>
> Github: https://github.com/tzlaine/text
> Online docs: https://tzlaine.github.io/text

I've read the intro on why is std::string so bad and I have to disagree
with many points.

1. The Fat Interface

In which way is std::string bloat? Of course some functions are probably
here as synonymous but to say it's bloat is kinda false. Just look at
Java's String numerous functions instead [0].

Comparing std::string to Java's string class is not doing std::string any favors.

And I

2. The Missing Unicode Support

Yes, many newcomers may be surprised to see that a string "é" has a size
of 2 bytes (assuming UTF-8). But it's also the case of UTF-16 strings
which may have surrotage pairs...

UTF-8 is the way to go and effectively stored. One could argue that we
should have some utf8 iterators or things like that. But std::string is
still a good candidate for string manipulations.

I agree that UTF-8 is the way to go (and as I think you've seen, the library reflects that). However, UTF-8 encoding is only part of the story. There is also normalization. If you use UTF-8-in-std::strings, normalization will not be enforced. (Neither will UTF-8 encoding, but that's less of a problem if you always intend to produce replacement characters for broken UTF-8.) Most users will want a type that enforces normalization as a class invariant. Those that do not have the tools -- the algorithms and iterators in the Unicode layer -- to do that in a std::string if they want.

3. Miscellaneous Limitations

Not thread-safe being an issue? Thanks god it is not. Imagine the
overhead of a threadsafe version of a string. The purpose of a library
is not to be threadsafe on every objects. This has to be on the user side.

I don't think all string types should be threadsafe, but having a threadsafe option is nice. That was not an explicit goal of adding ropes, but it is a nice side-effect of the choice I made for how to implement the ropes in Boost.Text.

That said, I really hope for a better unicode support in std:: in the
near future. Your library is well designed and API is clean, I hope it
could be added in Boost :-).

Thanks, me too. :)

Zach