Boost logo

Boost :

Subject: Re: [boost] Heads up - string_ref landing
From: Rob Stewart (robertstewart_at_[hidden])
Date: 2012-11-28 05:48:22


On Nov 27, 2012, at 7:01 AM, Andrey Semashev <andrey.semashev_at_[hidden]> wrote:

> On Tue, Nov 27, 2012 at 2:36 PM, Rob Stewart <robertstewart_at_[hidden]> wrote:
>> On Nov 26, 2012, at 6:56 AM, Andrey Semashev <andrey.semashev_at_[hidden]> wrote:
>>
>>> The problem with std::string is the same as with string_ref - it
>>> doesn't support implicit construction from an arbitrary range, so my examples with custom string types would still not work.
>>
>> That's right. We have no universal string/range type for that purpose, so you use the standard string type.
>
> My point was that, in my understanding, string_ref is aimed to solve this issue in a transparent way but the proposal lacks the necessary
> interface.

I didn't realize you were arguing WRT the proposed class versus the concept, which is what I've been doing.

> I would have used string_ref to unify string-related interfaces if it transparently supported multiple string types, not limited by those defined in STL (and Boost, if boost::string_ref is to be implemented). Limiting it to particular types defeats its purpose.

OK, I suspect we're agreeing more than disagreeing. Here's the I/F of my string_ref:

- converting ctors from:
  o char const *
  o std::string const &
  o std::vector<char> const &
  o const_substring const & (my substring type)
- other ctors:
  o char const *, size_t
  o char const *, char const *
  o char const (&)[N]
- similar assignment operators
- similar assign() member functions
- bool is_null()
- safe bool or explicit bool conversion operator
- char const * data()
- size_t length()
- char const * begin()/end()
- string_ref substr()
- char operator[](size_t)

(I think that's a complete list. I'm doing it from memory now.)

It is very string-like and convenient. The same behaviors would be messier without a class (versus a range type and algorithms), though less general.

I have not extended mine to support arbitrary ranges, via Boost.Range, simply because the need hasn't arisen, but it can be done. Likewise for arbitrary iterator pairs.

>>> It is possible, if the third-party strings follow the begin()/end()
>>> protocol.
>>
>> Now you're changing the rules. TP strings don't all provide iterators.
>
> Any reasonable string type will have some notion of iterators, be that custom types or pointers or a pointer and a size, whatever. As long as this holds, the third-party string type can be adopted.
>
> I understand that not all (nearly none?) third-party strings support begin()/end() protocol now, but I expect them to support eventually. Even if they don't, the necessary overloads can be provided externally.

I think such support is a reasonable addition.

>>> No, this is not needed. iterator_range has implicit constructor from a range, so the conversion will be hidden from both the user and the library developer.
>>
>> That only applies to types recognized as ranges. It isn't all string types. The same support should be part of string_ref, but an important distinction is that string_ref requires a contiguous range.
>
> iterator_range doesn't detect that its constructor argument is a range or not. If applying begin()/end() to it is a valid operation, the conversion will succeed. I'd like string_ref to behave the same way.

OK

> I see only one corner case: C strings. But I believe the solution is possible. Either begin()/end() can be defined for const char* or the string_ref can have the corresponding constructor. The latter is one (and only, AFAICS) reason to have string_ref type in addition to contiguous_range.

(char const *, size_t) is also common and convenient.

>>> Extracting termination policy to a template parameter is a possibility but it has drawbacks of its own. It makes harder to provide a stable API/ABI for compiled libraries.
>>
>> You'd only use the terminated one in APIs in rare cases, so a separate class is simpler.
>
> So I would not introduce it at all for that reason. Just use
> std::string in such cases.

Using std::string loses the possibility of using the string_ref when it references a null terminated range. Thus, you'd always allocate and copy.

>> There are semantic differences between a contiguous range of characters and a string, but a contiguous range type would be useful in and of itself.
>
> The semantic difference is a matter of content and its interpretation. You can store non-printable elements in std::string (and it is sometimes more convenient and efficient than std::vector< char >) and
> printable characters in std::vector< char >. The interface of std::vector< char > and std::string is mostly the same when it comes to string processing (not counting std::string members that can be
> replaced with free algorithms). The same applies to string_ref and contiguous_range< const char* >, the only notable difference being the construction from const char*.

I've never used std::string for non-string character storage. I use std::vector<char>. I realize that precludes any SBO opportunity, but I'd use another, non-string type in that case. Like Daniel, I see string processing as special. Maybe I'm just stuck in my old ways.

___
Rob


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk