|
Boost : |
From: Rainer Deyke (rdeyke_at_[hidden])
Date: 2020-06-14 06:40:12
On 14.06.20 01:25, Zach Laine via Boost wrote:
> On Fri, Jun 12, 2020 at 4:15 PM Rainer Deyke via Boost
> <boost_at_[hidden]> wrote:
>> A memmapped string_view /is/ a contiguous sequence of char. I don't see
>> the difference.
>
> The difference is mutability. There's no perf concern with erasing
> the first element of a string_view, if that's not even a supported
> operation.
A /lot/ of strings, probably the vast majority, will never be mutated.
And for the rest, the majority will only be mutated by appending.
Erasing the first element is a nice to have but expensive and rarely
used feature. If you find yourself doing that a lot, then you probably
do want a rope.
>> Somewhere in the implementation of operator[] and operator(), there has
>> to be a branch on index < 0 (or >= 0) in order for that negative index
>> trick to work, which the compiler can't always optimize away. Branches
>> are often affordable but they're not free.
>
> Ah, I see, thanks. Would it make you feel better if negative indexing
> were only used when getting substrings?
That does address the performance problem, so yes.
>> I hadn't thought through the interface in detail. I just saw that this
>> was a feature of the text layer, and thought it would be nice to have in
>> the unicode layer, because I don't want to use the text layer (in its
>> current form).
>
> I don't need a detailed interface. Pseudocode would be fine too.
insert_nfd(string, position, thing_to_insert)
// Insert 'thing_to_insert' into 'string' at 'position'. Both 'string'
// and 'thing_to_insert' are required to be in NFD. The area around the
// insertion is renormalized to NFD.
>> Having to renormalize at API boundaries can be prohibitively expensive.
>
> Sure. Anything can be prohibitively expensive in some context. If
> that's the case in a particular program, I think it is likely to be
> unacceptable to use text::operator+(string_view) as well, since that
> also does on-the-fly normalization.
Hopefully only on the string_view and the area immediately surrounding
the insertion.
> Someone, somewhere, has to pay
> that cost if you want to use two chunks of text in
> encoding/normalization A and B. You might be able to keep working in
> A for some text and keep working in B separately for other text, but I
> think code that works like that is going to be hard to reason about,
> and will be as common as code that freely mixes wstring and string
> (and I mean not only at program boundaries). That is, not very
> common.
Which is why I want to avoid just that.
Your suggestions:
void f() {
// renormalizes to fcc
text::text t = api_funtion_that_returns_nfd();
do_something_with(t);
string s;
text::normalize_to_nfd(t.extract(), back_inserter(s));
api_function_that_accepts_nfd(s);
}
My suggestion:
void f() {
text::text<nfd, std::string> t = api_function_that_returns_nfd();
do_something_with(t);
api_function_that_accepts_nfd(t.extract());
}
> That's what I don't get. Could you explain how text<A> and text<B>
> are useful in a specific case?
text<deque<char> >, for fast insertion/removal at both ends?
But it's really text::text<std::string> that I'm after, so if
text::text<std::string> becomes just text::text, then I'm satisfied.
-- Rainer Deyke (rainerd_at_[hidden])
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk