Boost logo

Boost :

Subject: Re: [boost] lifetime of ranges vs. iterators
From: Giovanni Piero Deretta (gpderetta_at_[hidden])
Date: 2008-09-05 00:51:13


2008/9/4 David Abrahams <dave_at_[hidden]>:
>>>> But iterators like filter_iterator that need to know iteration
>>>> boundaries, can store a range instead of an iterator, so they can
>>>> benefit from the compression.
>>>
>>> Oh, I totally missed that; sorry! Let's see...
>>>
>>> strided_iterator<int*,5> stores range<int*>
>>>
>>> sizeof(range<int*>) == 2*sizeof(int*)
>>>
>>> filter_iterator<strided_iterator<int*,5> > stores
>>> range<strided_iterator<int*,5> >
>>>
>>> range<strided_iterator<int*,5> > is represented as
>>>
>>> unique_data<strided_iterator<int*,5> >::type start;
>>> unique_data<strided_iterator<int*,5> >::type finish;
>>> common_data<strided_iterator<int*,5> >::type common;
>>>
>>> unique_data<strided_iterator<int*,5> >::type == int*
>>> common_data<strided_iterator<int*,5> >::type == int*
>>>
>>> so its size is 3*sizeof(int*)
>>>
>>> ?? Did I get that right?
>>>
>>>
>>> If so, it seems like there's still redundant storage. finish and common
>>> will be representing essentially the same information. If I were to
>>> write a filter_strided_iterator adapter and apply it to int*, it would
>>> have the size of 2 int*s.
>>> a
>>
>> What information is contained in common that is not also contained in
>> finish (i've never implemented a strided iterator, I'm trying to
>> understand) ?
>
> My point is that there is essentially the same information in
> common and finish, but that you can't eliminate it because the
> abstractions don't allow it. A strided_iterator implementation and test
> are attached.
>

Hum, I have been thinking about this for a while, and the more I think
about it, the more it seems to me that that strided_iterator
implementation is not optimal.
Again, I've never needed one, so I'm not sure about the use case. I'll
try to reconstruct the problems from the hints you left in this
thread:

Basically a strided iterator is an adaptor that visit every Nth
element in a range, where N is a compile time constant. From your
implementation I gather that the underlying range must be a random
access range. A straight forward implementation is a thin wrapper
around a random access iterator, you just replace operators ++ and --
with +=N (with similar changes for operators + and -).

The problem is that you want to visit ranges whose size is not an even
multiple of N, the above scheme wouldn't work, because before
comparison with endpoint, you would have already moved the iterator
beyond that end point. If the endpoint is one past the end of the base
range, you have UB.

The solution used in your implementation is to actually store the
unstrided endpoint and position the internal iterator at that end
point if the next increment would go beyond the boundary.
This requires a conditional branch in increment, and makes the
semantics of the iterator a bit surprising: you have to know the
boundaries of the iteration when you construct the begin iterator and
two iterators over the same underlying sequence with same stride and
phase, but constructed with different end positions, will actually be
iterating with diffent ranges (one iterator can't reach the end
position of the other iterator).

Now, the obvious solution is not to store the end position but instead
to use a base+offset representation which are only combined together
at dereference time.
The member functions become:

void incement() { //similar thing for decrement
    offset += N;
}

void advance(distance_type n) {
    offset += n * N;
}

reference dereference() const {
    return base[offset];
}

distance_type distance_to(filter_iterator rhs) {
     return ((base - rhs.base) + (offset - rhs.offset)) / N ; // I
might have got the sign wrong
}

[ in fact, searching the archives, it seems you are aware of this
implementation, so I guess there is some problems I'm missing if you
didn't use it]

Assuming sizeof(size_t) == sizeof(void*), the size of this iterator is
the same as the size of your implementation.

Now a base and range size are the only informations you need to
reconstruct begin and end

begin.base = range.base;
begin.offset= 0;
end.base = range.base;
end.offset = range.lenght;

Thus sizeof(strided_range) can be sizeof(void*)*2 which is optimal.

This doesn't of course prove that storing a range would lead to
optimal representation for all iterator adaptors, but at least is
enough for guaranteeing an optimal (in term of space)
filter_iterator<strided_iterator>.

Am I missing something?

-- 
gpd

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk