Boost logo

Boost :

From: Martin Slater (mslater_at_[hidden])
Date: 2005-10-03 01:44:10


Hi there,

I've been looking at speed issues loading archives for the last few days
and came across something that puzzles me. We are using binary_archive's
for storing relatively large data sets (> 100mg) comprised largely of
std::vectors of pod types, this was causing huge inefficencies as for
each element it was serialising it seperately so I derived a new archive
type with specialisations for std::vector load/save that dispatches on
podness (?) so if pod it just block writes/read the entire vector. This
gave us a massive speedup so it could be useful to include this by
default in the provided binary archive as the reaction of a number of
our developers when first looking at vtune before this optimisation was
that Boost.serialisation was just rubbish as it was so slow. (let me
know if you want to see the implementation, its only about 100 lines or so).

The next slowest thing as revealed by vtune is the call to
basic_iarchive_impl::register_type and inparticular this code :-

    cobject_type co(cobject_info_set.size(), bis);
    std::pair<cobject_info_set_type::const_iterator, bool>
        result = cobject_info_set.insert(co);

with the call to extended_type_info_typeid_0::less_than caused by the
std::set::insert call consuming nearly 8% of the time needed to load the
archive. Tracing this through revealed there seems to be some effort to
optimise this call by using extended_type_info::type_info_key which is
used by type_info_key_cmp (extended_type_info.cpp) to give an early out
but this code

    if(lhs.type_info_key == rhs.type_info_key)
        return 0;

in type_info_key_cmp always compares true (a breakpoint set after never
gets hit) which then causes operator<(const extended_type_info &lhs,
const extended_type_info &rhs) to eventually call type_info::before to
give ordering information which is where the slowness comes from. This
always returns true as it is just a pointer to class static data
declared in extended_type_info_typeid_0. The comment attached to the
member in extended_type_info states

    // used to uniquely identify the type of class derived from this one
    // so that different derivations of this class can be simultaneously
    // included in implementation of sets and maps.
    const char * type_info_key;

but I cannot see how this will ever be different for any types. I get
the impression that it is intended to be different for all types and the
test is just relying on its address being unique per specialization of
extended_type_info_typeid but I may be well off the mark, can anyone
(Robert?) clarify this for me please.

thanks

Martin

ps. This is using vc7.1 compiler.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk