Boost logo

Boost :

Subject: Re: [boost] Heads up - string_ref landing
From: Yanchenko Maxim (maximyanchenko_at_[hidden])
Date: 2012-11-16 11:54:48


16.11.2012, 14:45, "Olaf van der Spek" <ml_at_[hidden]>:
> On Fri, Nov 16, 2012 at 11:31 AM, Yanchenko Maxim
> <maximyanchenko_at_[hidden]> wrote:
>
>> Not only subscripted access. Taking a subrange also requires knowing size. Copying from/to (read memcpy) - same.
>> Filling (read memset) - same.
>> Comparing (read memcmp) - same.
>
> Those are C-style constructs. The C++-style equivalents are iterator-based.

Those are high-performance constructs. We can only pray that a compiler will be smart enough to convert our iterator-based code to memcpy/memcmp/memset, and from my experience compilers are not nearly as smart if it's slightly beyond trivial cases.

(char_range is an optimization technique so we aim for maximum speed. If you don't maximize speed you'd be happy with simple and safe std::string copies.)

> Suppose you have two pointers, 0xa0 (begin) and 0xb0 (end). The size
> in bytes is 0x10.
> Suppose you have one pointer (0xa0) and one size (0x10). Does this
> point to the same memory?
"this" means 0xa0+0x10? By construction - yes, they do.
We trust the caller that he gave us correct size (or correct pair of begin/end pointers from which we compute size in our ctor).
std::string makes same assumptions.

> Yes if sizeof(value_type) == 1, no
> otherwise. You can't tell to what memory range it points without
> knowing sizeof(value_type)

Ah. The first pointer (0xa0) is typed, so we surely know value_type. That's why your 0xa0 - 0xb0 works. They are not void*, they are value_type*.

>>>>> Shouldn't they be implicit?
>>>> Not from std::string. Same argument as for not having implicit conversion to char*.
>>> What argument would that be?
>> You are giving away a reference to string internals that are subject to change/die anytime.
>
> Isn't that by definition for a reference? It applies to const string&
> too. I don't think that's a good reason.

It's not a reference to std::string, it's a reference to *internals* of std::string. Those internals are managed by std::string exclusively.
I.e. if you have a reference to std::string and you expand the string, the reference will continue to work with no problem, while a reference to internals will be invalidated (the simplest example of a reference to internals are invalidating iterators).
But when you give away iterators, you do it explicitly via begin/end. Same way, if you give away a reference to std::string internals, you do it explicitly via data/c_str. This make potentially dangerous code visible. Same should be done with char_range construction from std::string::data - it should be explicit.

Btw, const references are not that harmless, consider this innocent-looking code:

    struct S {
      const std::string& ref_;
      S(const std::string& ref): ref_(ref) {}
    };
    
    S s1("foo");
    S s2(std::string("bar"));

>> Making it explicit and visible in the caller code ensures that the programmer will take special measures to make sure that the string doesn't change/die while there's a char_range looking into it.
>>
>> Consider std::vector<char_range>, for example.
Back to this example:

    // std::vector<std::string> v; - too slow, upgrading to our new char_range!
    std::vector<char_range> v;
    v.push_back( "foo" );
    v.push_back( std::string("bar") ); // BOOM

When pushing stuff to this vector, we want to be 100% sure that strings that gave away their char_ranges will live longer than the vector and live unchanged. And for this we need all the help a compiler can give us, namely - force us to explicitly declare the give-away and fail to compile otherwise.

char_range is an efficient, but dangerous technique. I'm not a particular fan of Python, but when it comes to ownership management in C++, I prefer their maxima "explicit is better than implicit".

>> For the same reason we have explicit char_range::literal and char_range::from_array.
> I'd like this to work:
> void f(str_ref);
> f("Olaf");
f( char_range::literal("Olaf") );
Explicit and with size known at compile-time (so compiler can utilize this knowledge).

Thanks,
Maxim


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk