Boost logo

Boost :

From: Joaquín Mª López Muñoz (joaquin_at_[hidden])
Date: 2006-05-29 10:11:14


Maxim Yegorushkin ha escrito:

> JOAQUIN LOPEZ MU?Z wrote:

[...]

> > If so,
> > I cannot find a reason why mi_set performs in general
> > sensibly better than std_set, since the data structures
> > and algorithms are basically equivalent, and B.MI
> > introduces some metaprogramming overhead that the
> > optimizer usually doesn't clean away entirely. I'd
> > have hoped for a small decrease in performance
> > for B.MI, though not as high as that observed in
> > mi_hash wrt std_hash. Do you have an idea anout this?
>
> The attached results show that B.MI is faster on insert and erase, but slightly
> slower on find hit/miss than std::set. My guesswork is that std::set probably
> does more(?) balancing on insert/erase.

I doubt that. The basic RB-tree algorithms are based on SGI STL
both in the case of B.MI and libstdc++. I might be wrong, of course,
and libstdc++ might have changed a lot wrt to its SGI ancestor.

[...]

> > I must say that B.MI does not intend to outperform
> > single-index std containers since in general the
> > data structures are equivalent; it is when
> > multi-indexing comes into play that I expect a
> > large impact in favor of B.MI.
>
> I see nothing what could prevent B.MI from being as fast as the
> standard containers.

The metaprogramming scaffolding constructed by B.MI with the
aim of allowing multi-indexing is not always completely optimized
away by the compiler, basically, most operations go through a lot more
(ideally inlined) functions. GCC is traditionally poorer at optimizing
than say MSVC or ICC. But the range of degradation I expect is
0-10%, not as high as the 250+% on insert that your tests show for
mi_hash...

...which has gotten me examining the test to determine what's going
weird. I think I found it: your insert test consists of invocations of
the form

  h.insert(begin, end);

for which std_hash (and presumably ext_hash) are smart enough to
pre-rehash based on n=std::distance(begin,end) when this distance's
calculation is affordable. B.MI hashed indices don't do that, so the
same call incurs several rehashing ops from size()=0 till size()=n.
Could you do me a favor? Please replace your local copy of
hashed indices range insert member function (at line 249 of
the RC_1_34_0 version of boost/multi_index/hashed_index.hpp):

  template<typename InputIterator>
  void insert(InputIterator first,InputIterator last)
  {
    BOOST_MULTI_INDEX_HASHED_INDEX_CHECK_INVARIANT;
    iterator hint=end();
    for(;first!=last;++first)hint=insert(hint,*first);
  }

with the following:

  template<typename InputIterator>
  void insert(InputIterator first,InputIterator last)
  {
    BOOST_MULTI_INDEX_HASHED_INDEX_CHECK_INVARIANT;

    // WARNING: following line only for testing purposes.
    reserve(size()+std::distance(first,last));

    iterator hint=end();
    for(;first!=last;++first)hint=insert(hint,*first);
  }

and rerun your tests? If my analysis is correct, figures for
insert(mi_hash) should be then on par with or only slightly worse
than std_hash and ext_hash. If this hunch proves right, it is certainly
an opportunity for huge performance improvements and I will
deal with it (in a proper way, the patch above is only a hack meant
for testing purposes).

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk