Boost logo

Boost :

From: brangdon_at_[hidden]
Date: 2001-11-21 15:47:12


In-Reply-To: <4034600A4760D411B8720001031D84FB4310A8_at_[hidden]>
On Tue, 20 Nov 2001 10:50:13 -0600 Aleksey Gurtovoy (alexy_at_[hidden])
wrote:
> It doesn't necessary mean that; you can always store the input
> sequence's state in the sequence itself, and make the iterators
> as lightweight as they possibly might be; for example,
> the 'pseudo_sequence_iterator' above is basically just a pointer.

Not if the sequence is to support multiple iterators, or multiple
simultaneous algorithms. The per-algorithm state has to be stored
somewhere. Your approach seems simple only because it ignores memory
management and life-time issues. For example, how do I return a
pseudo_sequence_iterator from a function?

> You are trying to generalize and find "the most natural" interface for
> input sequences only

No; I'm interested in other kinds of sequence, too.

Here is an implementation of reverse() with no naked iterators:

    template <typename sequence>
    void reverse( sequence seq ) {
        while (seq.size() > 1) {
            swap( seq.front(), seq.last() );
            seq.pop_front();
            seq.pop_back();
        }
    }

Here I have adopted a deque-like notation. As usual, I am more concerned
with the structure than with exact choice of function names. Pop_front()
is the function I previously called advance(); it does not deallocate
storage and there is no push_front() because a sequence is not a container
(more on this below).

I think the sequence form is better because the sequence helps make
explicit several constraints and invariants. Eg that seq.begin() and
seq.end() always point into the same container. That begin() is always <=
end(). That it is wrong to decrement begin() or increment end() (how would
we know when to stop?). That an empty sequence has no front() or back().
Some of these are enforced at compile time and the rest can trivially be
runtime assertions.

Notice they are properties of the sequence that don't apply to individual
iterators. For example, if we replace seq.front() with *seq.begin(), it's
much harder to verify that we are not dereferencing when we shouldn't.
Iterators are at the wrong level of abstraction.

Although the main advantages here are correctness, notice the use of
size(). If we replaced it with (seq.end()-seq.begin()), we would need
random access iterators to get constant time. However, with the function
size() we can cache an int in the sequence and update it in the pop
functions, thus giving O(N) time even with bi-directional iterators. This
shows that size() is at a more appropriate level of abstraction.

> What does it mean, "to construct a sub-sequence"? What in definition of
> the Sequence concept made you think that the "sequence(first, last)"
> expression is valid and has a meaningful semantics?

Most of this debate is about how the concept should be defined. You are
citing your definition as if it were the only possibility, or as if it
were already decided. I am disagreeing. I am suggesting a more general
notion of sequence. I don't intend my examples to conform to your
definition, and this is not because I have completely missed the semantics
of the Sequence concept. I believe I have better semantics.

Here is an illustration of the kind of algorithm I would hope to write
using sequences:

    template <typename sequence>
    void fat_qsort( sequence seq ) {
        if (seq.size() < 2)
            return;
        sequence p = fat_partition( seq, seq.front() );
        fat_qsort( left( seq, p ) );
        fat_qsort( right( seq, p ) );
    }

This is the "Fat Pivot" variant of Quick Sort. The arguments to
fat_partition() are a sequence and a pivot value. It performs a lot of
swaps and returns a subsequence, such that items now before the
subsequence are less than the pivot, items in the subsequence are equal to
the pivot, and items after the subsequence are greater than the pivot. The
idea is to avoid processing the equal items when we recurse - it can be
more efficient if many items are equal.

I am less interested in fat_pivot's implementation as in what its
signature should look like. In particular, how it returns its result if
sequences cannot be constructed or copied. Here are the more trivial
algorithms, which also return subsequences:

    template <typename sequence>
    sequence left( sequence seq, sequence sub ) {
        return sequence( seq.begin(), sub.begin() );
    }

    template <typename sequence>
    sequence right( sequence seq, sequence sub ) {
        return sequence( sub.end(), seq.end() );
    }

I can see how to write these algorithms using std::pair<iterator,iterator>
(more on this below). I cannot see how to write it using your definition
of the sequence Concept.

> How would you write your recursive 'for_each' if it would take a pair
> of iterators?

  template< typename iterator, typename function>
  function for_each( iterator first, iterator last, function f ) {
       if (first == last)
           return f;
       f( *first );
       ++first;
       return for_each( first, last, f );
   }

Notice I increment an iterator. Since iterators are passed by value this
is a natural thing to do. The standard version of std::for_each also
increments iterators freely. I see no reason to depart from this.

The pair version would be:

  template< typename iterator, typename function>
  function for_each( std::pair<iterator,iterator> seq, function f ) {
       if (seq.first == seq.second)
           return f;
       f( *sec.first );
       ++seq.first;
       return for_each( seq, f );
   }

Again, I pass by value and I increment in place. Iterators are mutable, a
std::pair is a pair of iterators, so why shouldn't std::pair be mutable?

Replacing the low-level operations on integers with high-level operations
on sequences, I get:

  template< typename sequence, typename function>
  function for_each( sequence seq, function f ) {
       if (seq.empty())
           return f;
       f( seq.front() );
       seq.pop_front();
       return for_each( seq, f );
   }

All of this seems so obviously the right thing to do I have trouble
understanding your objection. Why must sequences be immutable? Why is it
good to add empty() and front() to containers but not to sequences?

> Please specify what particular points you would like me to asnwer, and
> we'll take it from there.

The key points are:

(1) Since iterators are current mutable and passed by value, and can be
copied without copying the items they point to, and since a sequence is a
substitute for a pair of iterators, shouldn't sequences be mutable and
passed by value etc? Why change them so radically?

(2) If sequences cannot be copied, how can we write algorithms which
return them as results?

(3) What is the objection to providing named functions like empty(), which
can be used instead of expressions like (begin()==end())?

(4) We agree sequences have advantages for users. Why shouldn't algorithm
implementors get those advantages too? Why is it wrong to seek to use
empty() in algorithms?

(5) Often algorithms rewritten to use only the higher level operations
become more general, clearer, easier to verify for correctness, and
sometimes more efficient. Why shouldn't this be done where it brings the
benefits?

-- Dave Harris


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk