Boost logo

Boost :

From: Greg Colvin (gcolvin_at_[hidden])
Date: 2002-01-07 17:00:36


From: "dave_abrahams" <david.abrahams_at_[hidden]>
> Reviving an old thread:

I'd forgotten about this thread, so remind me please, in
what way does std::multimap fail to meet your needs for
an associative container.

> Despite my arguments with the completeness of testing here, I think
> the design approach is valuable for many problems. I find myself
> AGAIN with the need for an associative vector in Boost.Python, and
> I'm considering sticking my associative vector in the detail
> namespace... but I'd really rather see something officially in Boost.
>
> Can we make progress on this? Any framework for associative vectors
> should allow both of these approaches, in addition to the
> straightforward one which has no stability across modifications.
>
> -Dave
>
> --- In boost_at_y..., Mike Gradman <windryder1_at_y...> wrote:
> > Corwin and I have done some benchmarking of two
> > methods that we could have used for iterator stability
> > with vec_multiset:
> >
> > 1. The one David Abrahams used for iterator stability
> > (indirect_container and indirect_iterator .. storing a
> > container of iterators to the data elements). This
> > means that the user has to populate a data container
> > such as a std::vector<T> and then create some other
> > container whose elements actually refer to elements in
> > that first data vector, such as a
> > std::set<vector<T>::iterator>. This method is a lean
> > and mean way to get iterator (or pointer) stability.
> > However, it comes at at possible cost of a loss of
> > memory contiguity (depending on the allocator) which
> > can slow down memory accesses and also extra address
> > arithmetic the machine must perform.
> > 2. Our method references the data elements directly.
> > Vec_multiset<T> internally stores an underlying
> > std::vector<T> which actually contains the data. The
> > vec_multiset<T>::iterator stores several simple pieces
> > of information:
> > * a pointer to the vec_multiset that it is referring
> > to
> >
> > * the index of the element in the vec_multiset's
> > underlying vector<T>
> > * an int that tells where to look in the referred to
> > container's change log of inserts/erasures after the
> > first lookup in the container so the iterator may
> > update its position and thus maintain iterator
> > stability.
> >
> > This method guarantees both iterator stability and
> > contiguity of memory at very little expense. The key
> > here is that there are no address computations
> > necessary each time elements are compared vs. the
> > indirect_iterator method which must dereference an
> > iterator/pointer.
> >
> > We believed that the direct storing of the vector data
> > would be much more efficient rather than the
> > indirect_container approach.
> >
> > So we performed a benchmark test to compare the two
> > methods.
> >
> > For this test, we used a sample of 15,000,000 ints and
> > used a vector<int> as our base container for the
> > indirect_iterator-like simulation to eliminate
> > contiguity as a factor.
> >
> > For the direct storing method (the method we used to
> > implement vec_multiset), we did the following:
> >
> > 1. Inserted the ints into the vector using
> > vector<int>::push_back().
> > 2. Called std::stable_sort() from begin() to end() on
> > the vector
> > 3. Did 15,000,000 std::lower_bound() calls on the
> > begin() to end() range to locate a number.
> >
> > We timed each step and then added the results together
> > to get the overall total for this method.
> >
> > To simulate the indirect_container method, we had to:
> >
> > 1. Populate a vector<int> with the 15,000,000 elements
> > using vector<int>::push_back().
> > 2. For each element vec[i] in the vector vec, we
> > inserted &vec[i] into a vector<int *> called pvec.
> > 3. Call std::stable_sort(pvec.begin(), pvec.end(),
> > IntStarLess()) where IntStarLess::operator()(pInt1,
> > pInt2) is true iff *p1 < *p2.
> > 4. Perform 15,000,000 std::lower_bound calls over the
> > pvec.begin() to pvec.end() range with an IntStarLess
> > as the predicate.
> >
> > Again, we timed each step and summed the results
> > together for the overall time.
> >
> > We performed the same operations using each method
> > using the same data set and calls to
> > std::lower_bound().
> >
> > The results we got back were astonishing which
> > confirmed our reasoning:
> >
> > ---- vector<int> ----
> > vec.push_back(): 26
> > stable_sort(vec.begin(), vec.end()): 20
> > lower_bound(): 6
> > *** Total time for vector<int> impl.: 52 ***
> >
> > ---- vector<int *> ----
> > vec_ptrs.push_back(): 8
> > stable_sort(vec_ptrs.begin(), vec_ptrs.end()): 55
> > lower_bound(): 44
> > Subtotal time for vector<int *> ops: 107
> > (but we're not done as we must add in the vector<int>
> > push_back time of: 26)
> > *** Total time for vector<int *> impl.: 133 ***
> >
> > As you can see, the std::stable_sort() and
> > std::lower_bound() calls for the direct approach
> > easily outperformed the corresponding calls for the
> > indirect_iterator simulation. Also, there is the
> > additional overhaead in the indirect method of having
> > to push_back the objects onto the data vector even
> > before it starts working with the pointers.
> >
> > The performance test proved that the direct approach
> > we used for vec_multiset is much faster and more
> > efficient than the use of creatures like
> > indirect_iterator.
> >
> > Caveats to these results:
> > 1. With larger objects, the indirect_iterator method
> > probably would perform much faster as the sort only
> > swaps pointers. The direct storing method's sort call
> > actually has to swap the objects themselves, which can
> > definitely be much more expensive. In the case of our
> > example above this penalty does not appear since ints
> > are small.
> >
> > 2. As Dave Abrahams pointed out, if your container
> > holds something like String objects, then holding a
> > vector of pointers (e.g. String *) may not lose you
> > much anyway since the underlying objects are in fact
> > holding the data via pointers.
> >
> > Mike
> >
> > Mike
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! GeoCities - quick and easy web site hosting, just
> $8.95/month.
> > http://geocities.yahoo.com/ps/info1
>
>
> Info: http://www.boost.org Send unsubscribe requests to: <mailto:boost-unsubscribe_at_[hidden]>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk