Boost logo

Boost Users :

From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2019-10-28 17:00:13


On Mon, Oct 28, 2019 at 3:35 AM David Demelier via Boost-users <
boost-users_at_[hidden]> wrote:

> Le 26/10/2019 à 03:11, Zach Laine via Boost-users a écrit :
> > About 14 months ago I posted the same thing. There was significant work
> > that needed to be done to Boost.Text (the proposed library), and I was a
> > bit burned out.
> >
> > Now I've managed to make the necessary changes, and I feel the library
> > is ready for review, if there is interest.
> >
> > This library, in part, is something I want to standardize.
> >
> > It started as a better string library for namespace "std2", with minimal
> > Unicode support. Though "std2" will almost certainly never happen now,
> > those string types are still in there, and the library has grown to also
> > include all the Unicode features most users will ever need.
> >
> > Github: https://github.com/tzlaine/text
> > Online docs: https://tzlaine.github.io/text
>
> I've read the intro on why is std::string so bad and I have to disagree
> with many points.
>
> 1. The Fat Interface
>
> In which way is std::string bloat? Of course some functions are probably
> here as synonymous but to say it's bloat is kinda false. Just look at
> Java's String numerous functions instead [0].
>

Comparing std::string to Java's string class is not doing std::string any
favors.

> And I
>
> 2. The Missing Unicode Support
>
> Yes, many newcomers may be surprised to see that a string "é" has a size
> of 2 bytes (assuming UTF-8). But it's also the case of UTF-16 strings
> which may have surrotage pairs...
>
> UTF-8 is the way to go and effectively stored. One could argue that we
> should have some utf8 iterators or things like that. But std::string is
> still a good candidate for string manipulations.
>

I agree that UTF-8 is the way to go (and as I think you've seen, the
library reflects that). However, UTF-8 encoding is only part of the
story. There is also normalization. If you use UTF-8-in-std::strings,
normalization will not be enforced. (Neither will UTF-8 encoding, but
that's less of a problem if you always intend to produce replacement
characters for broken UTF-8.) Most users will want a type that enforces
normalization as a class invariant. Those that do not have the tools --
the algorithms and iterators in the Unicode layer -- to do that in a
std::string if they want.

> 3. Miscellaneous Limitations
>
> Not thread-safe being an issue? Thanks god it is not. Imagine the
> overhead of a threadsafe version of a string. The purpose of a library
> is not to be threadsafe on every objects. This has to be on the user side.
>

I don't think all string types should be threadsafe, but having a
threadsafe option is nice. That was not an explicit goal of adding ropes,
but it is a nice side-effect of the choice I made for how to implement the
ropes in Boost.Text.

> That said, I really hope for a better unicode support in std:: in the
> near future. Your library is well designed and API is clean, I hope it
> could be added in Boost :-).
>

Thanks, me too. :)

Zach



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net