Boost logo

Boost :

From: David Abrahams (david.abrahams_at_[hidden])
Date: 2001-11-21 18:02:53


Mike,

Your test was practically /designed/ to favor your approach.

I agree that for some (perhaps even many) applications, your approach might
be most appropriate. There are, however, many applications where the user
wants to use indirect references. How many times have you heard the question
about whether you can put an auto_ptr in a container? For many applications,
you already have a pointer or smart pointer to the object; in that case
insertion is cheap. Another example arises when you need to store many
element references. There are lots of trade-offs.

If you want to prove that one approach is better than another in general,
you have to try much a more comprehensive suite of tests.

-Dave

----- Original Message -----
From: "Mike Gradman" <windryder1_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Wednesday, November 21, 2001 3:28 PM
Subject: Re: [boost] Re: Proposal: Sorted Vector Container

> Corwin and I have done some benchmarking of two
> methods that we could have used for iterator stability
> with vec_multiset:
>
> 1. The one David Abrahams used for iterator stability
> (indirect_container and indirect_iterator .. storing a
> container of iterators to the data elements). This
> means that the user has to populate a data container
> such as a std::vector<T> and then create some other
> container whose elements actually refer to elements in
> that first data vector, such as a
> std::set<vector<T>::iterator>. This method is a lean
> and mean way to get iterator (or pointer) stability.
> However, it comes at at possible cost of a loss of
> memory contiguity (depending on the allocator) which
> can slow down memory accesses and also extra address
> arithmetic the machine must perform.
> 2. Our method references the data elements directly.
> Vec_multiset<T> internally stores an underlying
> std::vector<T> which actually contains the data. The
> vec_multiset<T>::iterator stores several simple pieces
> of information:
> * a pointer to the vec_multiset that it is referring
> to
>
> * the index of the element in the vec_multiset's
> underlying vector<T>
> * an int that tells where to look in the referred to
> container's change log of inserts/erasures after the
> first lookup in the container so the iterator may
> update its position and thus maintain iterator
> stability.
>
> This method guarantees both iterator stability and
> contiguity of memory at very little expense. The key
> here is that there are no address computations
> necessary each time elements are compared vs. the
> indirect_iterator method which must dereference an
> iterator/pointer.
>
> We believed that the direct storing of the vector data
> would be much more efficient rather than the
> indirect_container approach.
>
> So we performed a benchmark test to compare the two
> methods.
>
> For this test, we used a sample of 15,000,000 ints and
> used a vector<int> as our base container for the
> indirect_iterator-like simulation to eliminate
> contiguity as a factor.
>
> For the direct storing method (the method we used to
> implement vec_multiset), we did the following:
>
> 1. Inserted the ints into the vector using
> vector<int>::push_back().
> 2. Called std::stable_sort() from begin() to end() on
> the vector
> 3. Did 15,000,000 std::lower_bound() calls on the
> begin() to end() range to locate a number.
>
> We timed each step and then added the results together
> to get the overall total for this method.
>
> To simulate the indirect_container method, we had to:
>
> 1. Populate a vector<int> with the 15,000,000 elements
> using vector<int>::push_back().
> 2. For each element vec[i] in the vector vec, we
> inserted &vec[i] into a vector<int *> called pvec.
> 3. Call std::stable_sort(pvec.begin(), pvec.end(),
> IntStarLess()) where IntStarLess::operator()(pInt1,
> pInt2) is true iff *p1 < *p2.
> 4. Perform 15,000,000 std::lower_bound calls over the
> pvec.begin() to pvec.end() range with an IntStarLess
> as the predicate.
>
> Again, we timed each step and summed the results
> together for the overall time.
>
> We performed the same operations using each method
> using the same data set and calls to
> std::lower_bound().
>
> The results we got back were astonishing which
> confirmed our reasoning:
>
> ---- vector<int> ----
> vec.push_back(): 26
> stable_sort(vec.begin(), vec.end()): 20
> lower_bound(): 6
> *** Total time for vector<int> impl.: 52 ***
>
> ---- vector<int *> ----
> vec_ptrs.push_back(): 8
> stable_sort(vec_ptrs.begin(), vec_ptrs.end()): 55
> lower_bound(): 44
> Subtotal time for vector<int *> ops: 107
> (but we're not done as we must add in the vector<int>
> push_back time of: 26)
> *** Total time for vector<int *> impl.: 133 ***
>
> As you can see, the std::stable_sort() and
> std::lower_bound() calls for the direct approach
> easily outperformed the corresponding calls for the
> indirect_iterator simulation. Also, there is the
> additional overhaead in the indirect method of having
> to push_back the objects onto the data vector even
> before it starts working with the pointers.
>
> The performance test proved that the direct approach
> we used for vec_multiset is much faster and more
> efficient than the use of creatures like
> indirect_iterator.
>
> Caveats to these results:
> 1. With larger objects, the indirect_iterator method
> probably would perform much faster as the sort only
> swaps pointers. The direct storing method's sort call
> actually has to swap the objects themselves, which can
> definitely be much more expensive. In the case of our
> example above this penalty does not appear since ints
> are small.
>
> 2. As Dave Abrahams pointed out, if your container
> holds something like String objects, then holding a
> vector of pointers (e.g. String *) may not lose you
> much anyway since the underlying objects are in fact
> holding the data via pointers.
>
> Mike
>
> Mike
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
> http://geocities.yahoo.com/ps/info1
>
> Info: http://www.boost.org Unsubscribe:
<mailto:boost-unsubscribe_at_[hidden]>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk