From: Corwin Joy (cjoy_at_[hidden])
Date: 2001-11-21 20:56:56
Actually, the goal wasn't to try to favor one method or another. (In fact, there will be cases where pointers are better than holding the raw data and vice versa). But I think that to understand the issues in designing a good container for this problem it might help to try to analyze some of the pieces involved.
As I see it, if you are dealing with an ordered container you will have three pieces that are going to consume CPU time:
1. Allocating and sorting the container.
Using, e.g. a quicksort on a vector this will require
- allocation for n elements of data
- O(n log(n)) comparison operations
- O(n log(n)) element swap operations
Using, e.g. a binary tree as in a multiset this will require
- allocations for n elements of data + overhead (left, right pointers etc.)
- O(n log(n)) comparison operations
- O(n log(n)) tree traversal operations
2. Binary searches to find elements in the container
Using a vector
- O(log(n)) comparison operations
Using a tree
- O(log(n)) comparison operations
- O(log(n)) traveral operations
3. Use of the iterators returned by the container. Any overhead associated with maintaining invariants for these iterators.
4. Effect of memory contiguity on 1-3 above.
So what do the benchmarks we've done say about vector<object> versus vector<object *> versus multiset<object> versus hash_multiset<object>?
First let's take vector<object> versus multiset<object>. In the the initial set of tests that we did this was analyzed for a int as the object which gave (times in seconds):
ratio of lookups to inserts multiset inserts vec_multiset inserts
hash_multiset inserts multiset lookups vec_multiset lookups hash_multiset lookups multiset total vec_multiset total hash_multiset total
1:1 15 6 4 2 2 1 17 8 5
First, note that the lookup times for multiset and vector were the same, which says that the penalty for tree traversals is not that large in this case (and should also not be a major factor in the sort). This also says to me that memory contiguity may not be playing that large a role here because even if elements are scattered in the multiset versus in a narrow area of memory in a vector the penalty may not be that large (a slightly surprising results). What's more interesting here is the story on the inserts which includes the allocation/sort phase for both containers. Comparing the time taken for lookups in the two containers makes it unlikely that tree traversals are to blame for the slow setup time of a multiset. Instead, it seems much more likely that the big culprit is allocation. Either, a) the allocator is not very good for multisets or b) the overhead of all the pointers associated with a tree node is bloating the object a lot and making allocations much slower than they should be. This is especially surprising in light of the fact that they vector may have to do reallocations and copying to retain contiguity which the multiset does not. So, I guess one moral here is to be careful about the overhead associated with each node.
The second comparison that we did below was for vector<int> versus vector<int *> -- where we allocated the items we put in the vector int * from a vector in order to remove the effects allocators and memory contiguity. Unsurprisingly, sorting a vector of int * came back significantly slower because each comparison operation has to go through dereference operations slowing it down relative to vector<int>. In general, the timing of the sort of vector<object> versus vector<object*> will come down to:
n log n comparison operations: Object * will always be slower because of additional derefereces as shown below
n log n swap operations: Object * may or may not be faster here. If the object it smaller than sizeof(pointer) Object * will have no advantage. If the object is larger than sizeof(pointer) it will be faster to swap pointers thatn the actual object.
So what would an "ideal" sorted container come down to:
1. Size matters. Keep the node overhead as small as possible to minimize allocation costs.
2. Given 1 try to minimize the comparison and swap costs for the individual elements.
3. Try to keep memory contiguity if you can - although it may not matter all that much.
So how does a sorted vector perform?
I would give it a B on item 1 - it keeps the size down but does expensive reallocations and copies to keep more memory contiguity than it may need.
I would give it a B on item 2 as well, it may have significant swap costs for larger objects making the setup time slower than it needs to be. (The swap time will not impact the search speed, though).
----- Original Message -----
From: "David Abrahams" <david.abrahams_at_[hidden]>
Sent: Wednesday, November 21, 2001 5:02 PM
Subject: Re: [boost] Re: Proposal: Sorted Vector Container
> Your test was practically /designed/ to favor your approach.
> I agree that for some (perhaps even many) applications, your approach might
> be most appropriate. There are, however, many applications where the user
> wants to use indirect references. How many times have you heard the question
> about whether you can put an auto_ptr in a container? For many applications,
> you already have a pointer or smart pointer to the object; in that case
> insertion is cheap. Another example arises when you need to store many
> element references. There are lots of trade-offs.
> If you want to prove that one approach is better than another in general,
> you have to try much a more comprehensive suite of tests.
> ----- Original Message -----
> From: "Mike Gradman" <windryder1_at_[hidden]>
> To: <boost_at_[hidden]>
> Sent: Wednesday, November 21, 2001 3:28 PM
> Subject: Re: [boost] Re: Proposal: Sorted Vector Container
> > Corwin and I have done some benchmarking of two
> > methods that we could have used for iterator stability
> > with vec_multiset:
> > 1. The one David Abrahams used for iterator stability
> > (indirect_container and indirect_iterator .. storing a
> > container of iterators to the data elements). This
> > means that the user has to populate a data container
> > such as a std::vector<T> and then create some other
> > container whose elements actually refer to elements in
> > that first data vector, such as a
> > std::set<vector<T>::iterator>. This method is a lean
> > and mean way to get iterator (or pointer) stability.
> > However, it comes at at possible cost of a loss of
> > memory contiguity (depending on the allocator) which
> > can slow down memory accesses and also extra address
> > arithmetic the machine must perform.
> > 2. Our method references the data elements directly.
> > Vec_multiset<T> internally stores an underlying
> > std::vector<T> which actually contains the data. The
> > vec_multiset<T>::iterator stores several simple pieces
> > of information:
> > * a pointer to the vec_multiset that it is referring
> > to
> > * the index of the element in the vec_multiset's
> > underlying vector<T>
> > * an int that tells where to look in the referred to
> > container's change log of inserts/erasures after the
> > first lookup in the container so the iterator may
> > update its position and thus maintain iterator
> > stability.
> > This method guarantees both iterator stability and
> > contiguity of memory at very little expense. The key
> > here is that there are no address computations
> > necessary each time elements are compared vs. the
> > indirect_iterator method which must dereference an
> > iterator/pointer.
> > We believed that the direct storing of the vector data
> > would be much more efficient rather than the
> > indirect_container approach.
> > So we performed a benchmark test to compare the two
> > methods.
> > For this test, we used a sample of 15,000,000 ints and
> > used a vector<int> as our base container for the
> > indirect_iterator-like simulation to eliminate
> > contiguity as a factor.
> > For the direct storing method (the method we used to
> > implement vec_multiset), we did the following:
> > 1. Inserted the ints into the vector using
> > vector<int>::push_back().
> > 2. Called std::stable_sort() from begin() to end() on
> > the vector
> > 3. Did 15,000,000 std::lower_bound() calls on the
> > begin() to end() range to locate a number.
> > We timed each step and then added the results together
> > to get the overall total for this method.
> > To simulate the indirect_container method, we had to:
> > 1. Populate a vector<int> with the 15,000,000 elements
> > using vector<int>::push_back().
> > 2. For each element vec[i] in the vector vec, we
> > inserted &vec[i] into a vector<int *> called pvec.
> > 3. Call std::stable_sort(pvec.begin(), pvec.end(),
> > IntStarLess()) where IntStarLess::operator()(pInt1,
> > pInt2) is true iff *p1 < *p2.
> > 4. Perform 15,000,000 std::lower_bound calls over the
> > pvec.begin() to pvec.end() range with an IntStarLess
> > as the predicate.
> > Again, we timed each step and summed the results
> > together for the overall time.
> > We performed the same operations using each method
> > using the same data set and calls to
> > std::lower_bound().
> > The results we got back were astonishing which
> > confirmed our reasoning:
> > ---- vector<int> ----
> > vec.push_back(): 26
> > stable_sort(vec.begin(), vec.end()): 20
> > lower_bound(): 6
> > *** Total time for vector<int> impl.: 52 ***
> > ---- vector<int *> ----
> > vec_ptrs.push_back(): 8
> > stable_sort(vec_ptrs.begin(), vec_ptrs.end()): 55
> > lower_bound(): 44
> > Subtotal time for vector<int *> ops: 107
> > (but we're not done as we must add in the vector<int>
> > push_back time of: 26)
> > *** Total time for vector<int *> impl.: 133 ***
> > As you can see, the std::stable_sort() and
> > std::lower_bound() calls for the direct approach
> > easily outperformed the corresponding calls for the
> > indirect_iterator simulation. Also, there is the
> > additional overhaead in the indirect method of having
> > to push_back the objects onto the data vector even
> > before it starts working with the pointers.
> > The performance test proved that the direct approach
> > we used for vec_multiset is much faster and more
> > efficient than the use of creatures like
> > indirect_iterator.
> > Caveats to these results:
> > 1. With larger objects, the indirect_iterator method
> > probably would perform much faster as the sort only
> > swaps pointers. The direct storing method's sort call
> > actually has to swap the objects themselves, which can
> > definitely be much more expensive. In the case of our
> > example above this penalty does not appear since ints
> > are small.
> > 2. As Dave Abrahams pointed out, if your container
> > holds something like String objects, then holding a
> > vector of pointers (e.g. String *) may not lose you
> > much anyway since the underlying objects are in fact
> > holding the data via pointers.
> > Mike
> > Mike
> > __________________________________________________
> > Do You Yahoo!?
> > Yahoo! GeoCities - quick and easy web site hosting, just $8.95/month.
> > http://geocities.yahoo.com/ps/info1
> > Info: http://www.boost.org Unsubscribe:
> > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
> Info: http://www.boost.org Unsubscribe: <mailto:boost-unsubscribe_at_[hidden]>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk