Boost logo

Boost :

From: dave_abrahams (david.abrahams_at_[hidden])
Date: 2002-01-07 16:40:51


Reviving an old thread:

Despite my arguments with the completeness of testing here, I think
the design approach is valuable for many problems. I find myself
AGAIN with the need for an associative vector in Boost.Python, and
I'm considering sticking my associative vector in the detail
namespace... but I'd really rather see something officially in Boost.

Can we make progress on this? Any framework for associative vectors
should allow both of these approaches, in addition to the
straightforward one which has no stability across modifications.

-Dave

--- In boost_at_y..., Mike Gradman <windryder1_at_y...> wrote:
> Corwin and I have done some benchmarking of two
> methods that we could have used for iterator stability
> with vec_multiset:
>
> 1. The one David Abrahams used for iterator stability
> (indirect_container and indirect_iterator .. storing a
> container of iterators to the data elements). This
> means that the user has to populate a data container
> such as a std::vector<T> and then create some other
> container whose elements actually refer to elements in
> that first data vector, such as a
> std::set<vector<T>::iterator>. This method is a lean
> and mean way to get iterator (or pointer) stability.
> However, it comes at at possible cost of a loss of
> memory contiguity (depending on the allocator) which
> can slow down memory accesses and also extra address
> arithmetic the machine must perform.
> 2. Our method references the data elements directly.
> Vec_multiset<T> internally stores an underlying
> std::vector<T> which actually contains the data. The
> vec_multiset<T>::iterator stores several simple pieces
> of information:
> * a pointer to the vec_multiset that it is referring
> to
>
> * the index of the element in the vec_multiset's
> underlying vector<T>
> * an int that tells where to look in the referred to
> container's change log of inserts/erasures after the
> first lookup in the container so the iterator may
> update its position and thus maintain iterator
> stability.
>
> This method guarantees both iterator stability and
> contiguity of memory at very little expense. The key
> here is that there are no address computations
> necessary each time elements are compared vs. the
> indirect_iterator method which must dereference an
> iterator/pointer.
>
> We believed that the direct storing of the vector data
> would be much more efficient rather than the
> indirect_container approach.
>
> So we performed a benchmark test to compare the two
> methods.
>
> For this test, we used a sample of 15,000,000 ints and
> used a vector<int> as our base container for the
> indirect_iterator-like simulation to eliminate
> contiguity as a factor.
>
> For the direct storing method (the method we used to
> implement vec_multiset), we did the following:
>
> 1. Inserted the ints into the vector using
> vector<int>::push_back().
> 2. Called std::stable_sort() from begin() to end() on
> the vector
> 3. Did 15,000,000 std::lower_bound() calls on the
> begin() to end() range to locate a number.
>
> We timed each step and then added the results together
> to get the overall total for this method.
>
> To simulate the indirect_container method, we had to:
>
> 1. Populate a vector<int> with the 15,000,000 elements
> using vector<int>::push_back().
> 2. For each element vec[i] in the vector vec, we
> inserted &vec[i] into a vector<int *> called pvec.
> 3. Call std::stable_sort(pvec.begin(), pvec.end(),
> IntStarLess()) where IntStarLess::operator()(pInt1,
> pInt2) is true iff *p1 < *p2.
> 4. Perform 15,000,000 std::lower_bound calls over the
> pvec.begin() to pvec.end() range with an IntStarLess
> as the predicate.
>
> Again, we timed each step and summed the results
> together for the overall time.
>
> We performed the same operations using each method
> using the same data set and calls to
> std::lower_bound().
>
> The results we got back were astonishing which
> confirmed our reasoning:
>
> ---- vector<int> ----
> vec.push_back(): 26
> stable_sort(vec.begin(), vec.end()): 20
> lower_bound(): 6
> *** Total time for vector<int> impl.: 52 ***
>
> ---- vector<int *> ----
> vec_ptrs.push_back(): 8
> stable_sort(vec_ptrs.begin(), vec_ptrs.end()): 55
> lower_bound(): 44
> Subtotal time for vector<int *> ops: 107
> (but we're not done as we must add in the vector<int>
> push_back time of: 26)
> *** Total time for vector<int *> impl.: 133 ***
>
> As you can see, the std::stable_sort() and
> std::lower_bound() calls for the direct approach
> easily outperformed the corresponding calls for the
> indirect_iterator simulation. Also, there is the
> additional overhaead in the indirect method of having
> to push_back the objects onto the data vector even
> before it starts working with the pointers.
>
> The performance test proved that the direct approach
> we used for vec_multiset is much faster and more
> efficient than the use of creatures like
> indirect_iterator.
>
> Caveats to these results:
> 1. With larger objects, the indirect_iterator method
> probably would perform much faster as the sort only
> swaps pointers. The direct storing method's sort call
> actually has to swap the objects themselves, which can
> definitely be much more expensive. In the case of our
> example above this penalty does not appear since ints
> are small.
>
> 2. As Dave Abrahams pointed out, if your container
> holds something like String objects, then holding a
> vector of pointers (e.g. String *) may not lose you
> much anyway since the underlying objects are in fact
> holding the data via pointers.
>
> Mike
>
> Mike
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! GeoCities - quick and easy web site hosting, just
$8.95/month.
> http://geocities.yahoo.com/ps/info1


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk