
Boost : 
Subject: Re: [boost] lifetime of ranges vs. iterators
From: David Abrahams (dave_at_[hidden])
Date: 20080905 09:38:22
on Fri Sep 05 2008, "Giovanni Piero Deretta" <gpderettaATgmail.com> wrote:
> 2008/9/4 David Abrahams <dave_at_[hidden]>:
>>>>> But iterators like filter_iterator that need to know iteration
>>>>> boundaries, can store a range instead of an iterator, so they can
>>>>> benefit from the compression.
>>>>
>>>> Oh, I totally missed that; sorry! Let's see...
>>>>
>>>> strided_iterator<int*,5> stores range<int*>
>>>>
>>>> sizeof(range<int*>) == 2*sizeof(int*)
>>>>
>>>> filter_iterator<strided_iterator<int*,5> > stores
>>>> range<strided_iterator<int*,5> >
>>>>
>>>> range<strided_iterator<int*,5> > is represented as
>>>>
>>>> unique_data<strided_iterator<int*,5> >::type start;
>>>> unique_data<strided_iterator<int*,5> >::type finish;
>>>> common_data<strided_iterator<int*,5> >::type common;
>>>>
>>>> unique_data<strided_iterator<int*,5> >::type == int*
>>>> common_data<strided_iterator<int*,5> >::type == int*
>>>>
>>>> so its size is 3*sizeof(int*)
>>>>
>>>> ?? Did I get that right?
>>>>
>>>>
>>>> If so, it seems like there's still redundant storage. finish and common
>>>> will be representing essentially the same information. If I were to
>>>> write a filter_strided_iterator adapter and apply it to int*, it would
>>>> have the size of 2 int*s.
>>>> a
>>>
>>> What information is contained in common that is not also contained in
>>> finish (i've never implemented a strided iterator, I'm trying to
>>> understand) ?
>>
>> My point is that there is essentially the same information in
>> common and finish, but that you can't eliminate it because the
>> abstractions don't allow it. A strided_iterator implementation and test
>> are attached.
>>
>
> Hum, I have been thinking about this for a while, and the more I think
> about it, the more it seems to me that that strided_iterator
> implementation is not optimal.
> Again, I've never needed one, so I'm not sure about the use case. I'll
> try to reconstruct the problems from the hints you left in this
> thread:
>
> Basically a strided iterator is an adaptor that visit every Nth
> element in a range, where N is a compile time constant.
Well, whether to make it a compiletime constant or a runtime one is a
design decision you can make either way, but that's basically it. You
might use it to traverse the diagonal of a matrix, for example.
> From your
> implementation I gather that the underlying range must be a random
> access range. A straight forward implementation is a thin wrapper
> around a random access iterator, you just replace operators ++ and 
> with +=N (with similar changes for operators + and ).
>
> The problem is that you want to visit ranges whose size is not an even
> multiple of N, the above scheme wouldn't work, because before
> comparison with endpoint, you would have already moved the iterator
> beyond that end point. If the endpoint is one past the end of the base
> range, you have UB.
>
> The solution used in your implementation is to actually store the
> unstrided endpoint and position the internal iterator at that end
> point if the next increment would go beyond the boundary.
> This requires a conditional branch in increment, and makes the
> semantics of the iterator a bit surprising: you have to know the
> boundaries of the iteration when you construct the begin iterator and
> two iterators over the same underlying sequence with same stride and
> phase, but constructed with different end positions, will actually be
> iterating with diffent ranges (one iterator can't reach the end
> position of the other iterator).
>
> Now, the obvious solution is not to store the end position but instead
> to use a base+offset representation which are only combined together
> at dereference time.
You're completely right about that, and there are even additional
reasons that you didn't mention, as I realized a couple days ago. I
didn't want to bring it up, though, because it doesn't change the
fundamental reality. Just replace strided_iterator by another layer of
filter_iterator.
> [ in fact, searching the archives, it seems you are aware of this
> implementation, so I guess there is some problems I'm missing if you
> didn't use it]
No, it's just that I've forgotten more in my life than I ever knew.
Remember, I forgot that you only need to store the endpoint of the
underlying range (and not the beginning) if you implement
filter_iterator correctly.
> Assuming sizeof(size_t) == sizeof(void*), the size of this iterator is
> the same as the size of your implementation.
>
> Now a base and range size are the only informations you need to
> reconstruct begin and end
>
> begin.base = range.base;
> begin.offset= 0;
> end.base = range.base;
> end.offset = range.lenght;
>
> Thus sizeof(strided_range) can be sizeof(void*)*2 which is optimal.
Yes, but that doesn't change anything fundamental about the issue I'm
trying to point at.
> This doesn't of course prove that storing a range would lead to
> optimal representation for all iterator adaptors, but at least is
> enough for guaranteeing an optimal (in term of space)
> filter_iterator<strided_iterator>.
>
> Am I missing something?
You may be missing that the issue I'm describing applies more generally
than in the single case of my
filter_iterator<broken_strided_iterator<int*> > example. It fails to
optimize any case where the outer iterator and inner iterator are
storing essentially the same information.
Another thing to look at: this sort of thing happens with
permutation_iterator as well.
 Dave Abrahams BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk