Boost logo

Boost :

From: brangdon_at_[hidden]
Date: 2001-11-11 16:40:11


In-Reply-To: <4034600A4760D411B8720001031D84FB431084_at_[hidden]>
On Mon, 5 Nov 2001 01:30:48 -0600 Aleksey Gurtovoy (alexy_at_[hidden])
wrote:
> In my opinion, rewriting/duplicating all existing STL algorithms
> in terms of 'src.at_end()', 'src.advance()' and 'src.current()'
> calls instead of itor != end, ++itor, *itor just for a sake
> of different naming conventions (instead of building a wrapping
> interface that just makes the implicit things explicit) is too far.

It is not just for the sake of a different naming convention. It is to
provide simpler and more general code. Specifically, to better support
sequences whose end is determined by a count or a sentinel value, or which
have a stride, or which need to reference their owning container.

I think having to represent user-defined sequences with a pair of
iterators, instead of a single object, produces artificial and inefficient
designs. Especially as both iterators must be of the same type; this means
any data added to one must also be added to the other, doubling the memory
overhead. It also means the iterator's code does not know whether it is
being used to mark the end of a sequence, or to hold a position within the
sequence.

Notice I am not here talking about the inconvenience of passing two
objects around. I am concerned about the ease and efficiency of
user-defined sequences. The 2-iterator representation is not always ideal.
Algorithms should try to abstract away from it.

> For one, as others already noted, marking input sequences as a
> special case just doesn't scale [...]

It scales in a different direction. As written it does not support random
access sequences well, but then the current STL idiom doesn't support
sentinel values very well. I could just as well say that the STL doesn't
scale. There is a value judgement involved here, as to which direction is
more important.

Further, you make an implicit decision that the STD should scale in one
direction only. I question this. I suspect that the strict ordering of
iterator concepts is itself a design constraint which will not scale.

(Incidently, I have been emphasising the new features, but of course a
sequence object would support begin() and end() so that any algorithm
which works with a pair of iterators can be adapted to work with a single
iterator-object. This would make them less "special".)

> (see http://groups.yahoo.com/group/boost/message/15353)

Thanks for the references. I've now read as much of that thread as I can
find. As far as I can tell, no-one there raised the issues which concern
me here. In particular, whether we should generalise the way in which
algorithms determine the sequence end. In my view:
    current_iter != end_iter

is a relatively low-level expression, intrinsically using two objects and
constructing the desired condition from them. I think I would be fairly
happy if we could get away from that, even if we had to keep some of the
explicit iterator stuff.

In other words, a compromise something like:

    template <typename sequence, typename functor>
    functor for_each( const sequence &s, functor f ) {
        typedef typename sequence_traits<s>::iterator iterator;
        for (iterator i = s.begin(); !at_end(s,i); ++i)
            f( *i );
        return f;
    }

Where:

    template <typename sequence, typename iterator>
    bool at_end( const sequence &s, iterator i ) {
        return i == s.end();
    }

so that STD containers continue to work as sequences (assuming that's a
good idea -- see below).

It is beginning to look more like Sean Parent's suggestion. This version
of for_each uses two objects, the iterator and the sequence, which is
something I was trying to avoid. However the objects at least have
different types, reflecting their different roles. This makes it easier to
achieve the objectives I set out above.

Still, I can't help feeling that *all* the iterator functions may need the
sequence, which leads me again to think that the thing doing the iteration
should be self-contained, a single object, and that object might as well
be the argument to for_each(). And we are back at my original suggestion.

> In short, sequence algorithms should take their arguments by
> reference ;).

As Fernando Cacciola said at the time, a container isn't really a
sequence. The difference shows up when an algorithm tries to construct
sub-sequences for passing to other algorithms. We do not want to copy
entire containers. Eg:

    template <typename sequence, typename functor>
    functor for_each( const sequence &s, functor f ) {
        if (empty( s ))
            return f;
        typedef typename sequence_traits<s>::iterator i = s.begin();
        f( *i );
        return for_each( sequence( ++i, s.end() ) );
    }

This is plausible code if a sequence is like a pair<iterator,iterator> but
not if a sequence is like a vector.

You write as if all these matters were already decided, and further
discussion unwelcome. However, as far as I can tell, nobody answered
Fernando's July 30th message:
    http://groups.yahoo.com/group/boost/message/15379?threaded=1

I think it matters. I think it is important that algorithms be able to
construct new sequences without copying the items in the sequence. So I
disagree that "Any Container is a model of Sequence".

This seems to be a major bifurcation in the design space, and the root of
our disagreements. If new sequences can be created cheaply (ie without
copying their elements), then they can be passed by value. If they are
passed by value, they can safely be mutated. It starts to make sense to
apply advance(), or operator++(), to a sequence, or to make its begin()
be an l-value, when that would not make sense for a container.

-- Dave Harris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk