Boost logo

Boost :

Subject: Re: [boost] [histogram] should some_axis::size() return unsigned or int?
From: Jorg Brown (jorg.brown_at_[hidden])
Date: 2018-12-03 02:15:42


[ Gaving cc'd me as the author of
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1227r0.html ]

"distance(begin, end)" isn't O(1) for many types of containers, which is
why P1227 defines ssize() as the static_casted result of size().

Of course, for many containers, size() is implemented as the subtraction of
two pointers, static_casted to size_t. (See
https://github.com/llvm-mirror/libcxx/blob/master/include/vectorfor
example) In those cases, it would make more sense to define ssize() as
simply the subtraction of those two pointers.

"difference_type should be able to express std::distance(begin(), end()) or
some algorithms will break"

A similar point came up at the San Diego standards meeting: If you have a
container large enough that it might take up more than half the address
space, ptrdiff_t will not be able to express the difference between begin()
and end(). The conclusion drawn (not necessarily shared by everyone in the
room) was that containers that take up more than half of the address space
are not really supported by C++. Or at least, not if their iterators have
byte-level granularity.

""better" in that it allows a container intended exclusively for small
sizes to use a difference_type smaller than ptrdiff_t."

I applaud your use of quotation marks around better, since such a
difference_type is often inefficient in that the caller is often forced to
immediately promote it to larger size, before the processor can work with
it.

= - = - = - = - =

Before going down the road of size() and ssize(), the real question is what
should this do:

for (const auto &el: some_axis) {
  // Iterates over the underflow/overflow bins?
}

for (size_t i = 0; i != some_axis.size(); ++i) {
  auto &el = some_axis[i];
  // Iterates over the underflow/overflow bins?
}

It looks like you wanted to use an index of -1 and size() to refer to
underflow and overflow. That's... interesting... and I mean "interesting"
in a good sense of the word.

What this means is that someone who wants to iterate the *-flow bins might
write:

for (int i = -1; i != some_axis.size() + 1; ++i) {
  auto &el = some_axis[i];
  // Iterates over all the bins, including *-flow.
}

And the code would work. However someone else might naively write that
first line differently:

for (int i = -1; i < some_axis.size() + 1; ++i) {

and that loop would never iterate any entries at all, if size() returned an
unsigned value. This code, however, would work:

for (int i = -1; (i == -1) || (i < some_axis.size() + 1); ++i) {

... it just gets really ugly, really fast...

with an ssize() routine, this does work:

for (int i = -1; i < some_axis.ssize() + 1; ++i) {

But this does not:

for (size_t i = -1; i < some_axis.ssize() + 1; ++i) {

= - = - =

If it were me writing the code purely for myself, I would avoid unsigned
integers except for bit manipulation and intended overflow situations. And
this would all work fine. Of course that still leaves the question of
whether you want iteration from begin() to end() to include the
overflow/underflow bins.

Since you're writing this code for others, I'd be wary of breaking user's
ingrained size_t / less-than habits.

I'm curious about whether you've considered using 0 and size() - 1 as the
indices for the special bins? In that case, users would write this to
iterate over all bins:

for (const auto &el: some_axis) {
for (size_t i = 0; i < some_axis.size(); ++i) {

And something like this to iterate over just the non-special bins:

for (size_t i = 1; i <= some_axis.num_bins(); ++i) {

This would support both my "signed everywhere" contingent, as well as the
"size_t everywhere, be sure it never goes negative" contingent.

-- Jorg

On Sun, Dec 2, 2018 at 4:15 PM Gavin Lambert <boost_at_[hidden]> wrote:

> On 30/11/2018 20:43, Olaf van der Spek wrote:
> > Not really:
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1227r0.html
>
> Isn’t that proposed definition of ssize() essentially identical to
>
> // member function
> constexpr difference_type ssize() const
> { return std::distance(begin(), end()); }
> // or moral equivalent, for containers that can implement it
> // more efficiently another way.
>
> // non-member function in std::
> template <class C>
> constexpr auto ssize(const C& c)
> { return distance(begin(c), end(c)); }
> // or calling member ssize() if it exists
>
> The above seems "better" in that it allows a container intended
> exclusively for small sizes to use a difference_type smaller than
> ptrdiff_t.
>
> (In the text it explains why it's a bad idea for a container using
> uint16_t sizes with more than 32767 elements to use int16_t as a
> difference_type, but as I understand it that's already undefined
> behaviour -- size_type is required to be larger than difference_type and
> difference_type should be able to express std::distance(begin(), end())
> or some algorithms will break.)
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk