Boost logo

Boost :

From: Rainer Deyke (rdeyke_at_[hidden])
Date: 2020-06-14 06:40:12


On 14.06.20 01:25, Zach Laine via Boost wrote:
> On Fri, Jun 12, 2020 at 4:15 PM Rainer Deyke via Boost
> <boost_at_[hidden]> wrote:
>> A memmapped string_view /is/ a contiguous sequence of char. I don't see
>> the difference.
>
> The difference is mutability. There's no perf concern with erasing
> the first element of a string_view, if that's not even a supported
> operation.

A /lot/ of strings, probably the vast majority, will never be mutated.
And for the rest, the majority will only be mutated by appending.

Erasing the first element is a nice to have but expensive and rarely
used feature. If you find yourself doing that a lot, then you probably
do want a rope.

>> Somewhere in the implementation of operator[] and operator(), there has
>> to be a branch on index < 0 (or >= 0) in order for that negative index
>> trick to work, which the compiler can't always optimize away. Branches
>> are often affordable but they're not free.
>
> Ah, I see, thanks. Would it make you feel better if negative indexing
> were only used when getting substrings?

That does address the performance problem, so yes.

>> I hadn't thought through the interface in detail. I just saw that this
>> was a feature of the text layer, and thought it would be nice to have in
>> the unicode layer, because I don't want to use the text layer (in its
>> current form).
>
> I don't need a detailed interface. Pseudocode would be fine too.

insert_nfd(string, position, thing_to_insert)
// Insert 'thing_to_insert' into 'string' at 'position'. Both 'string'
// and 'thing_to_insert' are required to be in NFD. The area around the
// insertion is renormalized to NFD.

>> Having to renormalize at API boundaries can be prohibitively expensive.
>
> Sure. Anything can be prohibitively expensive in some context. If
> that's the case in a particular program, I think it is likely to be
> unacceptable to use text::operator+(string_view) as well, since that
> also does on-the-fly normalization.

Hopefully only on the string_view and the area immediately surrounding
the insertion.

> Someone, somewhere, has to pay
> that cost if you want to use two chunks of text in
> encoding/normalization A and B. You might be able to keep working in
> A for some text and keep working in B separately for other text, but I
> think code that works like that is going to be hard to reason about,
> and will be as common as code that freely mixes wstring and string
> (and I mean not only at program boundaries). That is, not very
> common.

Which is why I want to avoid just that.

Your suggestions:

   void f() {
     // renormalizes to fcc
     text::text t = api_funtion_that_returns_nfd();
     do_something_with(t);
     string s;
     text::normalize_to_nfd(t.extract(), back_inserter(s));
     api_function_that_accepts_nfd(s);
   }

My suggestion:

   void f() {
     text::text<nfd, std::string> t = api_function_that_returns_nfd();
     do_something_with(t);
     api_function_that_accepts_nfd(t.extract());
   }

> That's what I don't get. Could you explain how text<A> and text<B>
> are useful in a specific case?

text<deque<char> >, for fast insertion/removal at both ends?

But it's really text::text<std::string> that I'm after, so if
text::text<std::string> becomes just text::text, then I'm satisfied.

-- 
Rainer Deyke (rainerd_at_[hidden])

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk