Boost logo

Boost :

From: Aleksey Gurtovoy (alexy_at_[hidden])
Date: 2001-11-20 11:50:13


Dave Harris wrote:
> > In my opinion, rewriting/duplicating all existing STL algorithms
> > in terms of 'src.at_end()', 'src.advance()' and 'src.current()'
> > calls instead of itor != end, ++itor, *itor just for a sake
> > of different naming conventions (instead of building a wrapping
> > interface that just makes the implicit things explicit) is too far.
>
> It is not just for the sake of a different naming convention.
> It is to provide simpler and more general code.

I don't see how a code that is written in terms of a specialized concept
that covers only a very limited scope of a more broad domain, and is
incompatible with anything else, can be "more general" than the code that
works with a model of any concept from the concept hierarchy covering the
most of the domain.

> Specifically, to better support sequences whose end is determined
> by a count or a sentinel value, or which have a stride, or which
> need to reference their owning container.

I agree that there are cases when it's easier to code a sequence with
next(), current() and is_end() (NCE) interface than to go the begin/end
iterators' route. If you prefer that, you can _once_ write an adapter that
will convert (with minimal or no overhead) such pseudo input sequences into
STL ones, and than just forget about the issue:

#include "boost/operators.hpp"
#include <iostream>
#include <cassert>

template<typename PseudoSequence>
struct pseudo_sequence_iterator
    : boost::input_iterator_helper<
          pseudo_sequence_iterator
        , typename PseudoSequence::value_type
>
{
 private:
    typedef pseudo_sequence_iterator self;

 public:
    typedef typename PseudoSequence::value_type value_type;
    typedef typename PseudoSequence::reference reference;
    
    pseudo_sequence_iterator(PseudoSequence* owner = 0)
        : m_owner(owner)
    {
    }

    self& operator++()
    {
        assert(!past_the_end());
        m_owner->advance();
        return *this;
    }
  
    reference operator*() const
    {
        assert(!past_the_end());
        return m_owner->current();
    }
 
    friend bool operator==(self const& i1, self const& i2)
    {
        return i1.past_the_end() == i2.past_the_end();
    }

 private:
    bool past_the_end() const
    {
        return !m_owner || m_owner->at_end();
    }
    
    PseudoSequence* m_owner;
};

template<typename PseudoSequence>
boost::iterator_range<
    pseudo_sequence_iterator<PseudoSequence>
>
pseudo_seq(PseudoSequence& seq)
{
    typedef pseudo_sequence_iterator<PseudoSequence> iterator_t;
    return boost::make_iterator_range(
          iterator_t(&seq)
        , iterator_t()
        );
}

// usage example

class odds { // basically your 'odds' class
    int current_;
    int end_;
public:
    typedef int value_type;
    typedef int reference;

    odds(int end) : current_(1), end_(end) {}
    int current() const { return current_; }
    bool at_end() const { return current_ >= end_; }
    void advance() { current_ += 2; }
};

int main()
{
    odds o(100);
    boost::copy(
          pseudo_seq(o)
        , std::ostream_iterator<int>(std::cout,"\n")
        );
}

>
> I think having to represent user-defined sequences with a pair of
> iterators, instead of a single object

A pair of iterators (i.e. iterator range) is a single object too ;).

>, produces artificial and inefficient designs.

Artificial, sometimes, - I agree. But that's a price for a much greater
genericity at large scale. Inefficient - well, in most cases, with properly
applied optimization techniques (like factoring out a common state from
iterators into the sequence itself), not to the extent that you might notice
it. Any domain model has its corner cases there things don't fit quite well.
The question is how important those corner cases are, and how much they
don't fit :). Personally, I believe that most of the issues with "artificial
and inefficient designs" in STL can be solved without resorting to such
drastic measures as rewriting the half of the library.

> Especially as both iterators must be of the same type; this means
> any data added to one must also be added to the other, doubling
> the memory overhead.

It doesn't necessary mean that; you can always store the input sequence's
state in the sequence itself, and make the iterators as lightweight as they
possibly might be; for example, the 'pseudo_sequence_iterator' above is
basically just a pointer.

> > For one, as others already noted, marking input sequences as a
> > special case just doesn't scale [...]
>
> It scales in a different direction. As written it does not support random
> access sequences well, but then the current STL idiom doesn't support
> sentinel values very well.

It does, you just need to build a tiny level on top of it to start enjoying
its full power.

> I could just as well say that the STL doesn't scale.

I have quite the opposite opinion. IMO iterator (ranges) abstraction is a
very practical and _working_ method of representing most kinds of access to
a sequential data. I haven't seen any other abstraction would scale so well
within so broad domain and yet provide a common ground and concepts that
clue the domain components together.

> Further, you make an implicit decision that the STD should
> scale in one direction only.

Nope, I don't :).

> I question this. I suspect that the strict ordering of
> iterator concepts is itself a design constraint which will not scale.

It is indeed, and I never advocated "the only right way" to classify the
concepts. We all know that current iterator classification suffers some
problems because it tries to fit several orthogonal aspects into a single
hierarchy of concepts. See, for example, Jeremy's paper on the topic
(http://www.lsc.nd.edu/~jsiek/iterator-categories.html). But it's the
traversal aspect of iterator classification that determines the
applicability and implementation of most algorithms, and you need some kind
of common interface and hierarchy there, otherwise things just become
impractical; that's just a fact - how would you work with a class
"hierarchy" in which each derived type would introduce an interface that
would provide a superset of its base-class functionality, but under
completely different names?

> (Incidently, I have been emphasising the new features, but of
> course a sequence object would support begin() and end() so
> that any algorithm which works with a pair of iterators can be
> adapted to work with a single iterator-object. This would
> make them less "special".)

It wouldn't made them fit into existing hierarchy of iterators/sequences.

I just don't see a need for rewriting all algorithms that work with input
sequences in terms of this new proposed interface; I can understand a desire
of writing a simple NCE interface for something that doesn't have a true end
iterator, and reluctance of going into "smart iterators" hackery every time
you want to create such new sequence, but you don't have to rewrite
algorithms for that! Writing an adaptor like I showed above is sufficient.

> > (see http://groups.yahoo.com/group/boost/message/15353)
>
> Thanks for the references. I've now read as much of that thread as
> I can find. As far as I can tell, no-one there raised the issues which
> concern me here. In particular, whether we should generalise the
> way in which algorithms determine the sequence end. In my view:
> current_iter != end_iter
>
> is a relatively low-level expression, intrinsically using two
> objects and constructing the desired condition from them.

You are trying to generalize and find "the most natural" interface for input
sequences only; effectively, you propose to build a separate library to deal
with such kind of sequences, and then to provide an adaptation/conversion
mechanism that would allow the rest of the domain to somehow interact with
that limited framework. Besides numerous practical issues, it just doesn't
feel right, and I think that (in general ;) "generalization" of special
cases is a wrong way to go.

> I think I would be fairly happy if we could get away from that,
> even if we had to keep some of the explicit iterator stuff.

Why? What we would gain by that?
 
Note that I do agree that current standard library doesn't support input
sequences (and any sequences) very well - that's what I've been saying for
years :). But that's not because it got the concepts wrong. It's much more a
technical issue - and most part of it is the original STL decision to take
iterator ranges as two separate parameters, so you could not chain sequence
algorithms without introducing superficial temporary variables and making
unnecessary copies of data, and that you also have to write those
superficial begin/end's all the time. But fixing these issues is easy, and
you don't need to reinvent a wheel for that.

>
> In other words, a compromise something like:
>
> template <typename sequence, typename functor>
> functor for_each( const sequence &s, functor f ) {
> typedef typename sequence_traits<s>::iterator iterator;
> for (iterator i = s.begin(); !at_end(s,i); ++i)
> f( *i );
> return f;
> }
>
> Where:
>
> template <typename sequence, typename iterator>
> bool at_end( const sequence &s, iterator i ) {
> return i == s.end();
> }
>
> so that STD containers continue to work as sequences
> (assuming that's a good idea -- see below).

Yes, this would help to get rid of the 'end' iterator in case when it's not
needed. Still, given that it's very possible, even for a sequence equivalent
of a pair of std::istream_iterator's, to create iterators that are very
cheap to copy and are basically pointers, I don't see a compelling need for
rewriting any single STL algorithm using such approach.

> It is beginning to look more like Sean Parent's suggestion.
> This version of for_each uses two objects, the iterator and the sequence,
> which is something I was trying to avoid. However the objects at least
have
> different types, reflecting their different roles.

BTW, talking about different roles. Sequence is a not an Iterator, and using
it as a one (that's what the NCE interface basically promotes) just doesn't
feel right (besides the fact that it has a well known drawbacks).

> > In short, sequence algorithms should take their arguments by
> > reference ;).
>
> As Fernando Cacciola said at the time, a container isn't really a
> sequence.

Any standard container is by definition a model of Sequence concept as
defined in http://groups.yahoo.com/group/boost/message/15353, because it
satisfies the concept's requirements.

> The difference shows up when an algorithm tries to construct
> sub-sequences for passing to other algorithms.

What does it mean, "to construct a sub-sequence"? What in definition of the
Sequence concept made you think that the "sequence(first, last)" expression
is valid and has a meaningful semantics?

Basically below you provided the algorithm that imposes arbitrary
requirements on the concept, and then stated that the concept "doesn't
work". With the same validity one can write:

template<typename InputIterator, typename UnaryFunction>
UnaryFunction
for_each(InputIterator first, InputIterator last, UnaryFunction f)
{
    for ( ; first.not_equal(last); first.next())
        f(first.current());
    return f;
}

and then claim that the standard InputIterator concept is broken, because
the code doesn't compile with InputIterator == int* ;).

> We do not want to copy
> entire containers. Eg:
>
> template <typename sequence, typename functor>
> functor for_each( const sequence &s, functor f ) {
> if (empty( s ))
> return f;
> typedef typename sequence_traits<s>::iterator i = s.begin();
> f( *i );
> return for_each( sequence( ++i, s.end() ) );
> }
>
> This is plausible code if a sequence is like a
> pair<iterator,iterator> but not if a sequence is like a vector.
>

It's not a valid code at all; it completely misses the semantics of the
Sequence concept, and also ignores its requirements.

Let me give you a small example. How would you write your recursive
'for_each' if it would take a pair of iterators? Probably something like
this:

template<typename InputIterator, typename UnaryFunction>
UnaryFunction
for_each(InputIterator first, InputIterator last, UnaryFunction f)
{
    if (first != last)
    {
        f(*first);
        return for_each(boost::next(first), last, f);
    }
    return f;
}

Now, what would be a first step to emphasize the fact that the first two
parameters of the algorithm are closely related, that from conceptual point
of view the algorithm takes two, not three parameters, - namely, an iterator
range, designating a sequence of elements to process, and a unary function
that would be called for each element of that sequence? My first step would
be to do just exactly that - to make the algorithm to take a single
"iterator range" object; given that it's only the beginning,
std::pair<InputIterator, InputIterator> would be good enough here:

template<typename InputIterator, typename UnaryFunction>
UnaryFunction
for_each(std::pair<InputIterator, InputIterator> const& seq, UnaryFunction
f)
{
    if (seq.first != seq.second)
    {
        f(*seq.first);
        return for_each(std::make_pair(boost::next(seq.first), seq.second),
f);
    }
    return f;
}

Next step would be to realize that using 'std::pair' for something that has
a more deep semantics is a hack :), and write an 'iterator_range' class (see
http://groups.yahoo.com/group/boost/message/15359), and re-write the
algorithm appropriately so it becomes

template<typename InputIterator, typename UnaryFunction>
UnaryFunction
for_each(boost::iterator_range<InputIterator> const& seq, UnaryFunction f)
{
    if (seq.begin() != seq.end())
    {
        f(*seq.begin());
        return for_each(boost::make_iterator_range(boost::next(seq.begin()),
seq.end()), f);
    }
    return f;
}

And yet the next step would be to realize that requiring a specific
representation of iterator ranges, namely
boost::iterator_range<[Input|Forward|..]Iterator> is unnecessary
restrictive, in particular comparing to original (InputIterator,
InputIterator, UnaryFunction) version that accepts all iterators that
satisfy the standard InputIterator concept requirements, not just something
wrapped into std::input_iterator_t<>, and that boost::iterator_range<> is
not in any way better than, let's say, my::iterator_range<> or
her::iter_range<> :), etc., and that if fact we don't care by what
particular class the input range is represented as far as it satisfies some
common requirement we've already implicitly defined, and so we just should
make that requirements explicit (i.e. introduce a (Input) Sequence concept),
and then make the algorithm truly generic (well, almost):

template<typename InputSequence, typename UnaryFunction>
UnaryFunction
for_each(InputSequence& seq, UnaryFunction f)
{
    if (seq.begin() != seq.end())
    {
        f(*seq.begin());
        return for_each(boost::make_iterator_range(boost::next(seq.begin()),
seq.end()), f);
    }
    return f;
}

And of course, it doesn't make sense to duplicate something that is already
implemented :), so the final version should be

template<typename InputSequence, typename UnaryFunction>
UnaryFunction
for_each(InputSequence& seq, UnaryFunction f)
{
    return std::for_each(boost::begin(seq), boost::end(seq), f);
}

> You write as if all these matters were already decided, and further
> discussion unwelcome.

Nope, and sorry if it sounded like that. I just wanted to point to the prior
discussion, although I do admit that I think that the issue is pretty clear
;).

> However, as far as I can tell, nobody answered
> Fernando's July 30th message:
> http://groups.yahoo.com/group/boost/message/15379?threaded=1

Please specify what particular points you would like me to asnwer, and we'll
take it from there.

> I think it matters. I think it is important that algorithms
> be able to construct new sequences without copying the
> items in the sequence. So I disagree that "Any Container is
> a model of Sequence".

Even given the explanation above?

--
Aleksey

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk