Boost logo

Boost :

From: Corwin Joy (cjoy_at_[hidden])
Date: 2005-08-23 23:11:55


----- Original Message -----
From: "Brock Peabody" <brock.peabody_at_[hidden]>
> I think I understand why there are problems with any iterator more
> powerful than an input iterator and I agree. I don't understand what
> benefit there is to supporting anything besides input iterators though.

O.K., so if you agree that supporting forward-only cursors makes sense
and that an input iterator has guarantees that are similar then
you are ready to think about the next major decision in designing an input
iterator which is how to implement the copy operation. When you implement
copy I think there are two fundamental choices:

1. When you copy an iterator, continue to point to the same underlying
cursor. There is precedent for this, the istream iterators do it. But
I don't much like it since programmers may "surprised" that when
they copy the iterator incrementing the copy affects the original.

2. When you copy the iterator, cache a copy of the current row value,
on increment open a new cursor (COW pattern). This means that copies won't
have
"side effects" on the original iterator. However, opening a new cursor
also can surprise programmers since they may expect the iterator to
behave like a forward iterator and produce records in a particular
order. This is what we did in DTL but it does very occasionally produce
confusion when folks expect our iterator to act as a forward iterator
and not just an input iterator. Below is a snippet from the DTL mailing list
where I explain further.

Anyway, to me this is a major design decision and I would be interested to
hear what others think.

discussion from
http://groups.yahoo.com/group/DatabaseTemplateLibrary/message/1559

> Paul Harris Wrote
>
> here's what I was trying to do:
>
> template <class In, class Out, class Compare> Out copy_duplicates(In
> begin, In end, Out out, Compare compare) {
> while (begin != end)
> {
> begin = adjacent_find(begin,end,compare);
> if (begin != end)
> {
> tcout << "Found duplicate " << *begin << endl;
> *out++ = *begin++;
> }
> }
> return out;
> }
>

I think you are expecting select_iterator to do something here that
it does not provide a guarantee for. As we say in the docs, a select
iterator is just an *Input Iterator*. For a definition look at the SGI STL
webpage:
http://www.sgi.com/tech/stl/InputIterator.html

In particular if you look at the notes this means that

[1] i == j does not imply ++i == ++j.

[2] Every iterator in a valid range [i, j) is dereferenceable, and j
is either dereferenceable or past-the-end. The fact that every
iterator in the range is dereferenceable follows from the fact that
incrementable iterators must be dereferenceable.

[3] After executing ++i, it is not required that copies of the old
value of i be dereferenceable or that they be in the domain of
operator==.

[4] It is not guaranteed that it is possible to pass through the same
input iterator twice.

So your algorithm is not following the requirements for an input
iterator. Nor should you expect distance() to return something
meaningful since the records are not in a particular order.
So what does this mean in terms of STL - it means that a
select_iterator will work with only a few of the STL algorithms,
those that promise to need only an input iterator. Unfortunately
there is not an easy way to enforce this check at compile time. The
guys at boost did some work a while ago where you could do some meta
checks for various iterator invariants but they could only detect
crude gaffs. So then why isn't everything a RandomDBView that gives
more guarantees?

1. Some ODBC drivers don't support scrollable recordsets or bulk
fetches.

2. For the drivers that do support scrollable recordsets, sometimes
getting a scrollable cursor extracts a huge penalty. When you ask
for a scrollable cursor you are requesting a snapshot of a set of
records in a particular order. On some databases, the way they handle
this is to dump the entire set of records to a file on the client!!
This can be very slow. Most modern databases are smarter, they track
a version of the table and an internal list that is the order - this
is still overhead but less.

So, I think there is a place for select_iterator, it just needs to be
clear what the guarantees are (which are few). I do also agree that
having an incremented copy reset is not all that nice, but at least
this guarantees that a copy of the iterator can give the full
recordset. The other major alternative would be to have all copies
of a select iterator point to the same recordset, but this gives the
nastiness that incrementing a copy of an iterator can now affect the
original. This may be acceptable, though, if we think about things
like istream_iterator where this is exactly what happens. Anyway, I
would want to think about it carefully before I went that route.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk