Actually, the goal wasn't to try to favor one
method or another. (In fact, there will be cases where pointers are better
than holding the raw data and vice versa). But I think that to understand
the issues in designing a good container for this problem it might help
to try to analyze some of the pieces involved.
As I see it, if you are dealing with an ordered
container you will have three pieces that are going to consume CPU
time:
1. Allocating and sorting the container.
Using, e.g. a quicksort on a vector this will
require
- allocation for n elements of
data
- O(n log(n)) comparison
operations
- O(n log(n)) element swap
operations
Using, e.g. a binary tree as in a
multiset this will require
- allocations for n elements of
data + overhead (left, right pointers etc.)
- O(n
log(n)) comparison operations
- O(n log(n)) tree traversal
operations
2. Binary searches to find elements in the
container
Using a vector
- O(log(n)) comparison
operations
Using a tree
- O(log(n)) comparison
operations
- O(log(n)) traveral
operations
3. Use of the iterators returned by the
container. Any overhead associated with maintaining invariants for these
iterators.
4. Effect of memory contiguity on 1-3
above.
So what do the benchmarks we've done say about
vector<object> versus vector<object *> versus multiset<object>
versus hash_multiset<object>?
First let's take vector<object> versus
multiset<object>. In the the initial set of tests that we did
this was analyzed for a int as the object which gave (times in
seconds):
ratio of lookups to inserts |
multiset inserts |
vec_multiset inserts |
hash_multiset inserts |
multiset lookups |
vec_multiset lookups |
hash_multiset lookups |
multiset total |
vec_multiset total |
hash_multiset total |
1:1 |
15 |
6 |
4 |
2 |
2 |
1 |
17 |
8 |
5 |
First, note that the lookup times for multiset and vector were the same,
which says that the penalty for tree traversals is not that large in this case
(and should also not be a major factor in the sort). This also says to me
that memory contiguity may not be playing that large a role here because even if
elements are scattered in the multiset versus in a narrow area of memory in a
vector the penalty may not be that large (a slightly surprising results).
What's more interesting here is the story on the inserts which includes the
allocation/sort phase for both containers. Comparing the time taken for
lookups in the two containers makes it unlikely that tree traversals are to
blame for the slow setup time of a multiset. Instead, it seems much more
likely that the big culprit is allocation. Either, a) the allocator is not
very good for multisets or b) the overhead of all the pointers associated
with a tree node is bloating the object a lot and making allocations much
slower than they should be. This is especially surprising in light of the
fact that they vector may have to do reallocations and copying to retain
contiguity which the multiset does not. So, I guess one moral here is to be
careful about the overhead associated with each node.
The second comparison that we did below was for vector<int> versus
vector<int *> -- where we allocated the items we put in the vector int *
from a vector in order to remove the effects allocators and memory
contiguity. Unsurprisingly, sorting a vector of int * came back
significantly slower because each comparison operation has to go through
dereference operations slowing it down relative to vector<int>. In
general, the timing of the sort of vector<object> versus
vector<object*> will come down to:
n log n comparison operations: Object * will
always be slower because of additional derefereces as shown below
n log n swap operations: Object * may or may not be
faster here. If the object it smaller than sizeof(pointer) Object * will
have no advantage. If the object is larger than sizeof(pointer) it will be
faster to swap pointers thatn the actual object.
So what would an "ideal" sorted container come down to:
1. Size matters. Keep the node overhead as small as possible to
minimize allocation costs.
2. Given 1 try to minimize the comparison and swap costs for the individual
elements.
3. Try to keep memory contiguity if you can - although it may not matter
all that much.
So how does a sorted vector perform?
I would give it a B on item 1 - it keeps the size down but does expensive
reallocations and copies to keep more memory contiguity than it may need.
I would give it a B on item 2 as well, it may have significant swap costs
for larger objects making the setup time slower than it needs to be. (The swap
time will not impact the search speed, though).
CJ
----- Original Message -----
Sent: Wednesday, November 21, 2001 5:02
PM
Subject: Re: [boost] Re: Proposal: Sorted Vector
Container
> Mike,
>
> Your test was practically /designed/ to favor
your approach.
>
> I agree that for some (perhaps even many)
applications, your approach might
> be most appropriate. There are,
however, many applications where the user
> wants to use indirect
references. How many times have you heard the question
> about whether you
can put an auto_ptr in a container? For many applications,
> you already
have a pointer or smart pointer to the object; in that case
> insertion is
cheap. Another example arises when you need to store many
> element
references. There are lots of trade-offs.
>
> If you want to prove
that one approach is better than another in general,
> you have to try
much a more comprehensive suite of tests.
>
> -Dave
>
> ----- Original Message -----
> From: "Mike Gradman" <windryder1@yahoo.com>
> To:
<boost@yahoogroups.com>
>
Sent: Wednesday, November 21, 2001 3:28 PM
> Subject: Re: [boost] Re:
Proposal: Sorted Vector Container
>
>
> > Corwin and I
have done some benchmarking of two
> > methods that we could have used
for iterator stability
> > with vec_multiset:
> >
> >
1. The one David Abrahams used for iterator stability
> >
(indirect_container and indirect_iterator .. storing a
> > container of
iterators to the data elements). This
> > means that the user has
to populate a data container
> > such as a std::vector<T> and
then create some other
> > container whose elements actually refer to
elements in
> > that first data vector, such as a
> >
std::set<vector<T>::iterator>. This method is a lean
>
> and mean way to get iterator (or pointer) stability.
> > However,
it comes at at possible cost of a loss of
> > memory contiguity
(depending on the allocator) which
> > can slow down memory accesses
and also extra address
> > arithmetic the machine must perform.
>
> 2. Our method references the data elements directly.
> >
Vec_multiset<T> internally stores an underlying
> >
std::vector<T> which actually contains the data. The
> >
vec_multiset<T>::iterator stores several simple pieces
> > of
information:
> > * a pointer to the vec_multiset that it is
referring
> > to
> >
> > * the index of the element
in the vec_multiset's
> > underlying vector<T>
> > * an
int that tells where to look in the referred to
> > container's change
log of inserts/erasures after the
> > first lookup in the container so
the iterator may
> > update its position and thus maintain
iterator
> > stability.
> >
> > This method
guarantees both iterator stability and
> > contiguity of memory at very
little expense. The key
> > here is that there are no address
computations
> > necessary each time elements are compared vs.
the
> > indirect_iterator method which must dereference an
> >
iterator/pointer.
> >
> > We believed that the direct storing
of the vector data
> > would be much more efficient rather than
the
> > indirect_container approach.
> >
> > So we
performed a benchmark test to compare the two
> > methods.
>
>
> > For this test, we used a sample of 15,000,000 ints and
>
> used a vector<int> as our base container for the
> >
indirect_iterator-like simulation to eliminate
> > contiguity as a
factor.
> >
> > For the direct storing method (the method we
used to
> > implement vec_multiset), we did the following:
>
>
> > 1. Inserted the ints into the vector using
> >
vector<int>::push_back().
> > 2. Called std::stable_sort() from
begin() to end() on
> > the vector
> > 3. Did 15,000,000
std::lower_bound() calls on the
> > begin() to end() range to locate a
number.
> >
> > We timed each step and then added the results
together
> > to get the overall total for this method.
>
>
> > To simulate the indirect_container method, we had to:
>
>
> > 1. Populate a vector<int> with the 15,000,000
elements
> > using vector<int>::push_back().
> > 2. For
each element vec[i] in the vector vec, we
> > inserted &vec[i] into
a vector<int *> called pvec.
> > 3. Call
std::stable_sort(pvec.begin(), pvec.end(),
> > IntStarLess()) where
IntStarLess::operator()(pInt1,
> > pInt2) is true iff *p1 <
*p2.
> > 4. Perform 15,000,000 std::lower_bound calls over the
>
> pvec.begin() to pvec.end() range with an IntStarLess
> > as the
predicate.
> >
> > Again, we timed each step and summed the
results
> > together for the overall time.
> >
> > We
performed the same operations using each method
> > using the same data
set and calls to
> > std::lower_bound().
> >
> > The
results we got back were astonishing which
> > confirmed our
reasoning:
> >
> > ---- vector<int> ----
> >
vec.push_back(): 26
> > stable_sort(vec.begin(), vec.end()): 20
>
> lower_bound(): 6
> > *** Total time for vector<int> impl.:
52 ***
> >
> > ---- vector<int *> ----
> >
vec_ptrs.push_back(): 8
> > stable_sort(vec_ptrs.begin(),
vec_ptrs.end()): 55
> > lower_bound(): 44
> > Subtotal time
for vector<int *> ops: 107
> > (but we're not done as we must add
in the vector<int>
> > push_back time of: 26)
> > ***
Total time for vector<int *> impl.: 133 ***
> >
> > As
you can see, the std::stable_sort() and
> > std::lower_bound() calls
for the direct approach
> > easily outperformed the corresponding calls
for the
> > indirect_iterator simulation. Also, there is
the
> > additional overhaead in the indirect method of having
>
> to push_back the objects onto the data vector even
> > before it
starts working with the pointers.
> >
> > The performance test
proved that the direct approach
> > we used for vec_multiset is much
faster and more
> > efficient than the use of creatures like
>
> indirect_iterator.
> >
> > Caveats to these
results:
> > 1. With larger objects, the indirect_iterator
method
> > probably would perform much faster as the sort only
>
> swaps pointers. The direct storing method's sort call
> >
actually has to swap the objects themselves, which can
> > definitely
be much more expensive. In the case of our
> > example above this
penalty does not appear since ints
> > are small.
> >
>
> 2. As Dave Abrahams pointed out, if your container
> > holds
something like String objects, then holding a
> > vector of pointers
(e.g. String *) may not lose you
> > much anyway since the underlying
objects are in fact
> > holding the data via pointers.
>
>
> > Mike
> >
> > Mike
> >
> >
__________________________________________________
> > Do You
Yahoo!?
> > Yahoo! GeoCities - quick and easy web site hosting, just
$8.95/month.
> > http://geocities.yahoo.com/ps/info1
> >
> > Info: http://www.boost.org
Unsubscribe:
> <mailto:boost-unsubscribe@yahoogroups.com>
> >
> > Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
> >
> >
>
>
> Info: http://www.boost.org
Unsubscribe: <mailto:boost-unsubscribe@yahoogroups.com>
>
> Your use of Yahoo! Groups is subject to
http://docs.yahoo.com/info/terms/
>
>