Boost logo

Boost :

From: Zach Laine (whatwasthataddress_at_[hidden])
Date: 2020-06-12 18:12:54


On Fri, Jun 12, 2020 at 8:40 AM Niall Douglas via Boost
<boost_at_[hidden]> wrote:
>
> On 11/06/2020 19:37, Glen Fernandes via Boost wrote:
>
> > The library provides three layers:
> > - The string layer, a set of types that constitute "a better std::string"
> > - The Unicode layer, consisting of the Unicode algorithms and data
> > - The text layer, a set of types like the string layer types, but
> > providing transparent Unicode support
>
> Firstly, I'd like to say that proposing a new string implementation is
> probably one of the most masochistic things that you can do in C++. Even
> more than proposing a result<T, E> type. So, I take a bow to you Mr.
> Laine, and I salute your bravery.
>
> I'll put aside the Unicode and Text layers for now, and just consider
> the String layer. I have to admit, I'm not keen on the string layer.
> Firstly I hate the naming. Everything there ought to get more
> descriptive naming. But more importantly, how the design of the string
> implementation has been broken up and designed, there's just something
> about it which doesn't sit right with me. You seem to me to have
> prioritised certain use cases over others which would not be my own
> choices i.e. I don't think the balance of tradeoffs is right in there.
> For example, I wouldn't have chosen an atomically reference counted rope
> design the way you have at all: I'd have gone for a fusion of your
> static string builder with constexpr-possible (i.e. non-atomic)
> reference counted fragments, using expression templates to lazily
> canonicalise the string depending on sink (e.g. if the sink is cout, use
> gather i/o sequence instead of creating a new string). That sort of thing.
>
> Zach, could you take this opportunity to compare your choice of string
> design with the string designs implemented by each of the following
> libraries please?
>
> - LLVM strings, string refs, twines etc.
>
> - Abseil's strings, string pieces.
>
> - Folly's strings, string pieces and ranges.
>
> - CopperSpice's CsString.
>
> I feel like I am forgetting at least another two. But, point is, I'd
> like to know why you chose a different design to each of the above, in
> those situations where you did so.

No, because I don't honestly care about the string layer that much any
more. It was originally a major reason -- the reason, really -- for
the library at the outset. Now it's mostly cruft. If people object
to it enough (and it seems they will), I can certainly remove it
entirely, except for unencoded_rope, which is needed in rope.
Replacing boost::text::string with std::string within
boost::text::text is straightforward, and will have no visible effect
on uses of text::text, except for extract() and replace(). The only
reason I left the string bits of the library in place when I changed
the focus to be Unicode-centric is that is was less work to do so.

> I'll nail my own colours to the mast on this topic: I've thought about
> this long and hard over many many years, and I've personally arrived on
> the opinion that C needs to gain an integral string object built into
> the language, which builds on top of an integral variably sized array
> object (NOT current C VLAs). Said same built-in string object would also
> be available to C++, by definition.
>
> I have arrived at this opinion because I don't think that ANY library
> solution can have the right balance of tradeoffs between all the
> competing factors. I think that only a built-in object to the language
> itself can deliver the perfect string object, because only the compiler
> can deliver a balance of optimisability with developer convenience.
>
> I won't go into any more detail, as this is a review of the Text C++
> library. And I know I've already discussed my opinion on SG16 where you
> Zach were present, so you've heard all my thoughts on this already.
> However, if you were feeling keen, I'd like to know if you could think
> of any areas where language changes would aid implementing better
> strings in C++?

I think the big thing for me would be to have language-level support
for discriminating between char * strings and string literals. String
literals are special in certain ways that are useful to take advantage
of: 1) they are not necessary to copy, since they're in ROM; 2) they
are encoded by the compiler into the execution encoding used in phase
5 of translation. This second one is pretty important to detect in
some cases, like making a printf-like upgrade to std::format() "just
work".

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk