Boost logo

Boost :

From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2022-08-15 20:41:15


On 8/15/22 21:54, Alan de Freitas via Boost wrote:
>>
>> It's less clear what should happen when the library calls append or assign
>> with
>> iterators whose value_type is char on something whose value_type is
>> wchar_t,
>> which is what you seem to be doing.
>
>
>
>> std::filesystem requirements[2] actually restrict Source arguments to
>> std string types, iterators (read: pointers) to C-style strings and
>> arrays of characters (which are interpreted as C-style strings).
>
> Yes. That sounds right. It looks like boost::filesystem is matching the
> behavior of std;:filesystem now, even if more restrictive (
> https://godbolt.org/z/5WYYce6v8).

Boost.Filesystem v3 is more permissive, as it still compiles the
std::list case, although with a warning. v4 will fail to compile.

> Boost.URL shouldn't use it and that's for the best, since std::filesystem
> doesn't support it either.
>
> Non-contiguous ranges that dereference to char work with both std:: and
> boost::filesystem if we use the `append(InputIterator begin, InputIterator
> end)` overload.
> I think that's the source of confusion here. What C++ says about the
> `append(Source const& source)` overload is even more misleading.
>
> (2) and (3) participate in overload resolution only if Source and path are
> not the same type, and either:
>
> - Source is a specialization of std::basic_string
> <https://en.cppreference.com/w/cpp/string/basic_string> or
> std::basic_string_view
> <https://en.cppreference.com/w/cpp/string/basic_string_view>, or
> - std::iterator_traits
> <http://en.cppreference.com/w/cpp/iterator/iterator_traits><std::decay_t
> <http://en.cppreference.com/w/cpp/types/decay><Source>>::value_type is
> valid and denotes a possibly const-qualified encoding character type (
> char, char8_t, (since C++20)char16_t, char32_t, or wchar_t).
>
>
> We assumed both overloads should work because of this second condition.

std::list is not an iterator, applying std::iterator_traits to it is not
valid (for one, std::list does not have an iterator_category).

>> If you're assigning a list to a path, most
>> likely you are doing something wrong.
>
>
> Yes. `append(InputIterator begin, InputIterator end)` would still allow the
> person to do this wrong thing though.
>
> And `append(InputIterator begin, InputIterator end)` doesn't look like it's
> always wrong.

The signature with two iterators is the established practice for
obtaining elements from a foreign sequence. You have it in every std
container, std::string, etc. In particular, it allows to obtain the
elements from exotic sources, like reading from an
std::istreambuf_iterator. There is no such practice with a
single-argument signature.

> Two obvious use cases could be (i) appending paths from resource trees or
> (ii) some std::ranges::view::... that transforms the input into the chars
> to represent a path segment for that input.
> If `append(InputIterator begin, InputIterator end)` is not wrong, it looks
> like `append(Source const& source)` would not be less wrong when Source is
> just the range holding the iterators for the first overload.
>
> In any case, both are still dangerous. Boost.URL and other libraries
> shouldn't count on it.
> As Peter mentioned, things like wstring and u16string could be appended,
> but the semantics will probably be wrong.
> They will convert char by char, without regards of encoding.

If the user calls a function passing two iterators, he is arguably aware
that he is constructing/assigning/appending elements one-by-one,
performing element-wise conversion, if needed. Again, this is
established practice.

If the user passes a single object to constructor/assignment/append, he
provides the call with additional information on the nature of the input
sequence, and the call is expected to behave according to that
knowledge. For example, the call may not copy anything at all and simply
move the contents or increment a reference counter, or use strlen to
discover the end of the string, or use a locale from the source to
perform character code conversion, and so on. As you can see, the
behavior of such call can be very different depending on the argument type.

Yes, with a range like boost::iterator_range or std::span there's really
nothing fancy going on, and semantically the call would be expected to
behave the same as with a pair of iterators. However, this is still a
special case that has to be supported by the call explicitly, among the
other single-argument signatures. This is relatively novel practice, and
in some cases like the one that started this discussion, it can be
ambiguous as to what the call actually does. In comparison, the
two-iterator signature is rather explicit and clear wherever you see it.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk