Boost logo

Boost Users :

From: Stanislas RENAN (stanislas.renan_at_[hidden])
Date: 2004-12-22 05:04:32


Joaquin M Lopez Munoz a écrit :
> You seem to have made the same changes as I, yet
> our numbers do not coindice. Could you please send me your
> changed code? Thanks!
There is a difference between our 2 codes : I haven't changed
the way age is set, thus it is not unique, and using get() on
this index should return a set (generally speaking) of employees.
I've not used equal_range() which makes the code broken anew.
Using it (but not returning a range) gives approximativelly the same
figures.

> Searching should be roughly the same, anyway, but I
> expect insertion to be noticeably faster in the case
> of multi_index_container.
I post the code at the end of the message.
I also post the patch using equal_range().

> I'd tend to think you would more likely pay a
> roughly fixed penalty per header, instead of a x13
> factor in compilation times, especially if your code
> takes already long to compile. But this may well be
> wishful thinking; please read below.
I've made the same guess, but have not yet tested it, and I'm not sure
to have the time to test it in existing real projects, as it is going to
take a too much time to implement.
I'm not in the development team of existing project I've talked about.

> Unfortunately, this is a known issue, and one for which
> I have no easy workaround. Boost.MultiIndex is heavily
> templatized, and this makes the compiler work real hard.
> I've been reported that MSVC 7.0 and 7.1 (aka .NET 2002
> and 2003) are faster when dealing with Boost.MultiIndex,
> but I guess switching compiler is not an option to you.
The problem is that to suggest using BOOST, and multiindex in
particular, I would have to proove that it is better (quicker,
more flexible, more standardized, ... or all of them) than the existing,
and even if this is true, transition from old to code using boost
would not be automatically done. I primarilly aim at using it in new
projects, if there is an estimated gain, after having examinated
most of advantages/drawbacks.
Buying a new compiler for testing purposes is not going to convince
my boss.

> Another possibility (not tested) is that you include
> Boost.MultiIndex headers inside your precompiled header
> stdafx.h. MSVC++ 6.0 precompiled headers support is a
> little fragile, so I won't be surprised if this resulted
> in spurious crashing and/or the need for full rebuilds.
> In any case, if you try this option please let me know.
According to one of the developper of the project I've talked about,
precompiled headers have been deactivated because of problems with
them.

> keep in mind this is going to be better with
> newer compilers.
For sure, but the problem is similar to the adoption of the STL a few
years ago, in companies using old compilers for their old projects,
with broken (or non-existing) STL support and hand-made containers.
There must be a real gain to incite people to change their tools and
practices.

> To my discharge, allow me to say that
> this is the best I've been able to come up with: after
> all, template stuff was not thought in the late nineties
> to be as extensively abused as it is now.
My purpose was not to criticize your work, which I have not yet studied
in depth in order to evaluate the advantages and drawbacks.
For now, to refocus on MI, I have roughly the four elements :
1-it is complex to write it the first times, but it can be encapsulated,
as STL is in my example ;
2-it is as efficient as the STL is, but not more in my example ;
3-it stresses the compiler a lot, which adds compilation time yet to be
evaluated ;
4-it avoids writing encapsulating classes if we are familiar with it,
which cannot be done with basic containers of the STL.

If point 2 is changed from "as efficient" to "more efficient", it
certainly will weight on decisions.
I admit that my test example is basic, and may not show the capabilities
of MI, but that's a real life example.

I have not treated the removal of elements. I have thought about adding
iterators on other indexes for each element in the maps, but it adds
2 iterators (I've got 3 indexes) and a container for each element.
But, hey, what I'm talking about here is just what you've done in your
performance tests, which addresses most of the situations.
I've written my test code having in mind populating and searching.
Removal wasn't necessary.

The problem of index synchronisation is the typical example of what a
programmer doesn't want to solve, to focus on the business part of the
application.
I guess using MI may efficiently address this type of problems, I just
need to evaluate its cost, and if there is a gain in real-life practise
(where multi indexed data is not so often used).

Here is a copy of the code I've used for the previous figures :

===
/* Boost.MultiIndex basic example.
  *
  * Copyright 2003-2004 Joaquín M López Muñoz.
  * Distributed under the Boost Software License, Version 1.0.
  * (See accompanying file LICENSE_1_0.txt or copy at
  * http://www.boost.org/LICENSE_1_0.txt)
  *
  * See http://www.boost.org/libs/multi_index for library home page.
  */
#include "stdafx.h"

#define COMPILE_BOOST
//#undef COMPILE_BOOST

#if defined (COMPILE_BOOST)
#if !defined(NDEBUG)
//#define BOOST_MULTI_INDEX_ENABLE_INVARIANT_CHECKING
//#define BOOST_MULTI_INDEX_ENABLE_SAFE_MODE
#endif

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/member.hpp>
#include <boost/multi_index/ordered_index.hpp>
#endif

#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>

#include <cstdlib>
#include <map>
#include <vector>
#include <ctime>
#include <sstream>
#include <iomanip>

#if defined (COMPILE_BOOST)
using boost::multi_index_container;
using namespace boost::multi_index;
#endif

#if !defined (COMPILE_BOOST)
namespace std
{
template<class _Ty> inline
        const _Ty& max(const _Ty& _X, const _Ty& _Y)
        {return (_X < _Y ? _Y : _X); }
template<class _Ty> inline
        const _Ty& min(const _Ty& _X, const _Ty& _Y)
        {return (_Y < _X ? _Y : _X); }
}

#endif

/* an employee record holds its ID, name and age */

struct employee
{
   int m_id;
   std::string m_name;
   int m_age;

   employee(int id_,const std::string& name_,int
age_):m_id(id_),m_name(name_),m_age(age_){}

   void set(int id_,const std::string& name_,int age_) { m_id=id_;
m_name=name_; m_age=age_; }
   const employee& operator =(const employee& e) { set(e.m_id, e.m_name,
e.m_age); return *this; }
   employee(const employee& e) { set(e.m_id, e.m_name, e.m_age); }

   int id() const { return m_id; } // impossible d'appeler la fonction
et l'index "id"
   const std::string& name() const { return m_name; }
   int age() const { return m_age; }

   friend std::ostream& operator<<(std::ostream& os,const employee& e)
   {
     os<<e.m_id<<" "<<e.m_name<<" "<<e.m_age<<std::endl;
     return os;
   }
};

/* tags for accessing the corresponding indices of employee_set */

struct id{};
struct name{};
struct age{};

/* see Advanced topics: Use of member_offset for info on
  * BOOST_MULTI_INDEX_MEMBER
  */

/* Define a multi_index_container of employees with following indices:
  * - a unique index sorted by employee::int,
  * - a non-unique index sorted by employee::name,
  * - a non-unique index sorted by employee::age.
  */

#if defined (COMPILE_BOOST)

typedef multi_index_container<
   employee,
   indexed_by<
     ordered_unique<
       tag<id>, BOOST_MULTI_INDEX_MEMBER(employee,int,m_id)>,
     ordered_non_unique<
       tag<name>,BOOST_MULTI_INDEX_MEMBER(employee,std::string,m_name)>,
     ordered_non_unique<
       tag<age>, BOOST_MULTI_INDEX_MEMBER(employee,int,m_age)> >
> employee_set;

template<typename Tag,typename MultiIndexContainer>
void print_out_by(
  const MultiIndexContainer& s,
  Tag* =0 /* fixes a MSVC++ 6.0 bug with implicit template function parms */
)
{
   /* obtain a reference to the index tagged by Tag */

   const typename
boost::multi_index::index<MultiIndexContainer,Tag>::type& i=
     get<Tag>(s);

   typedef typename MultiIndexContainer::value_type value_type;

   /* dump the elements of the index to cout */

 
std::copy(i.begin(),i.end(),std::ostream_iterator<value_type>(std::cout));
}
#endif

// Un conteneur qui permet de contenir des employee
// triés sur 3 index.
class EmployeeMapper
{
   std::map<int, employee *> m_byId;
   std::map<std::string, employee *> m_byName;
   std::multimap<int, employee *> m_byAge;
   typedef std::map<int, employee *>::iterator EMIdIt;
   typedef std::map<std::string, employee *>::iterator EMNameIt;
   typedef std::multimap<int, employee *>::iterator EMAgeIt;

public:
   EmployeeMapper() {}
   ~EmployeeMapper()
   {
     // Deux lignes pour éviter que ces objets ne contiennent
     // des pointeurs invalides pendant la destruction va un
     // autre conteneur.
     m_byName.clear();
     m_byAge.clear();

     std::map<int, employee *>::iterator it;
     while ((it=m_byId.begin()) != m_byId.end())
     {
       delete it->second;
       m_byId.erase(it);
     }
   }

   void add(const employee& e)
   {
     employee * e2 = NULL;
     try
     {
       e2 = new employee(e);
     }
     catch(...)
     {
       delete e2;
       throw;
     }
     m_byId[e2->id()] = e2;
     m_byName[e2->name()] = e2;
     //m_byAge[e2->age()] = e2;
     m_byAge.insert(std::make_pair<int, employee *>(e2->age(), e2));
   }

   employee* byId(int id)
   {
     EMIdIt it = m_byId.find(id);
     if (it == m_byId.end())
       return NULL;
     else
       return it->second;
   }

   employee* byName(const std::string& name)
   {
     EMNameIt it = m_byName.find(name);
     if (it == m_byName.end())
       return NULL;
     else
       return it->second;
   }

   employee* byAge(int age)
   {
     EMAgeIt it = m_byAge.find(age);
     if (it == m_byAge.end())
       return NULL;
     else
       return it->second;
   }

private:
   // fonctions inhibées
   EmployeeMapper(const EmployeeMapper&);
   const EmployeeMapper& operator=(const EmployeeMapper&);
};

// Retourne un entier tq : 0 <= entier < maximum
int alea(int maximum)
{
   return int(std::min(maximum, RAND_MAX) * rand() / (RAND_MAX + 1.0));
}

int main()
{
   // Init aléatoire
   srand( (unsigned)time( NULL ) );

#if defined (COMPILE_BOOST)
   employee_set es;
#endif
   EmployeeMapper em;
   const int tailleEnsemble = 1000;
   const int tailleEchantillon = 10;
   const int moduloEchantillon = tailleEnsemble / tailleEchantillon;
   const int nbRecherches = 10000;

   std::vector<std::string> noms;
   noms.push_back("Alphonse");
   noms.push_back("Éléonore");
   noms.push_back("Gaston");
   noms.push_back("Séverine");
   noms.push_back("Zorglub");

   std::vector<employee> jeuDeDepart;
   std::vector<employee> jeuARechercher;

   int i=0;
   for (i=0; i < tailleEnsemble; ++i)
   {
     std::ostringstream os;
     // Construction d'un nom unique avec un caractère aléatoire
     // Cela reflète le type de données gérées en réalité.
     // L'ajout de i implique l'unicité.
     os << noms[alea(5)] << std::setw(5) << std::setfill('0') << alea(30000)
                         << std::setw(5) << std::setfill('0') << i;
     employee e(i, // index numérique unique
                os.str(), // nom, chaîne unique
                alea(100)); // valeur potentiellement dupliquée

     jeuDeDepart.push_back(e);

     // On conserve un échantillon à rechercher plus tard
     if (0 == i % moduloEchantillon)
     {
       //employee_set::iterator it=get<id>(es).find(i);
       //if (it != es.end())
         jeuARechercher.push_back(jeuDeDepart[i]);

     }
   }
   clock_t tes1, tes2;
   tes1 = clock();
#if defined (COMPILE_BOOST)

   for (i=0; i < tailleEnsemble; ++i)
   {
     es.insert(jeuDeDepart[i]);
   }
#endif
   tes2 = clock();

   clock_t tem1, tem2;
   tem1 = clock();
   for (i=0; i < tailleEnsemble; ++i)
   {
     em.add(jeuDeDepart[i]);
   }
   tem2 = clock();

   clock_t t1_1, t2_1;
   t1_1 = clock();
   // Accède à un certain nombre d'éléments précis selon
   // plusieurs index
   for (i=0; i < nbRecherches; ++i)
   {
     for (std::vector<employee>::iterator it = jeuARechercher.begin();
          it != jeuARechercher.end();
          ++it)
     {
#if defined (COMPILE_BOOST)
       // Les fonctions membres template qui permette de trouver les
       // index par tag ne fonctionnent pas avec MSVC 6
       employee_set::iterator eit = get<id>(es).find(it->id());
       nth_index_iterator<employee_set,1>::type eit2 =
get<1>(es).find(it->name());
       nth_index_iterator<employee_set,2>::type eit3 =
get<2>(es).find(it->age());
#endif
      }
   }
   t2_1 = clock();

   clock_t t1_2, t2_2;
   t1_2 = clock();
   // Deuxième méthode : avec des maps et des news.
   for (i=0; i < nbRecherches; ++i)
   {
     for (std::vector<employee>::iterator it = jeuARechercher.begin();
          it != jeuARechercher.end();
          ++it)
     {
       em.byId(it->id());
       em.byName(it->name());
       em.byAge(it->age());
     }
   }
   t2_2 = clock();

   double d1, d2;
   d1 = t1_2 - t1_1;
   d2 = t2_2 - t2_1;

   std::cout << "Insertion" << std::endl;
   std::cout << "multi = " << (tes2-tes1) << std::endl
             << "STL = " << (tem2-tem1) << std::endl;
   std::cout << "Recherche" << std::endl;
   std::cout << "multi = " << d1 << std::endl
             << "STL = " << d2 << std::endl;
   return 0;
}
===

patch :
===
   //employee* byAge(int age)
   std::pair<EMAgeIt, EMAgeIt> byAge(int age)
   {
     /*EMAgeIt it = m_byAge.find(age);
      if (it == m_byAge.end())
       return NULL;
     else
       return it->second;
     */
     std::pair<EMAgeIt, EMAgeIt> it = m_byAge.equal_range(age);
     return it;
   }

===

Regards,

-- 
Stanislas RENAN

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net