Boost logo

Boost Users :

Subject: Re: [Boost-users] [Multi-index] hash_unique memory consumption vs std::map
From: joaquin_at_[hidden]
Date: 2008-11-05 05:27:32


Michael Fawcett escribió:
> In my application I need to load millions of vertices into a graph.
> I'm using BGL for this. Each edge has around 20-30 attributes
> associated with it and each attribute has a unique ID (not necessarily
> sequential or 0-based).
>
> I'm currently using this multi_index_container for the attributes:
>
> typedef detail::mutable_pair<long, _variant_t> id_attribute_pair_type;
> typedef boost::multi_index_container
> <
> id_attribute_pair_type,
> boost::multi_index::indexed_by
> < boost::multi_index::hashed_unique<boost::multi_index::member<id_attribute_pair_type,
> long, &id_attribute_pair_type::first> >
> >
>
>> map_type;
>>
>
> which emulates:
>
> typedef std::map<long, _variant_t> map_type;
>
> If I check memory consumption after loading the graph using
> multi_index I'm at 190MB. Using std::map it's at 124MB. I realize
> this is because std::map is perfectly sized and a hashmap will grow in
> chunks, but is there a way I can reserve a size ahead of time since I
> know exactly how many attributes there will be?
>

There is no way to fine-tune the size of the bucket array: this is in
general the minimum of a list
of prime numbers, roughly each the double of the prior, that is
compatible with the
maximum load factor specified . With the default maximum load factor
mlf=1.0, the size of the
bucket array can range approximately between n and 2n, where n is the
number of elements.
You can use max_load_factor(z) to set the maximum load factor slightly
below 1.0 to see if that
improves the situation (that is, if the bucket size keeps at the size
immediately preceding the one
you have now). The member function bucket_count() gives you the size of
the bucket
array. Is this indeed much larger than the number of elements?

Nevertheless, the differences in memory consumption do not look
consistent to me: a std::map
has an overhead of 12-16 bytes per element (16 in most cases, 12 if some
optimizations
for a so-called "color" internal parameter are applied) for 32-bit
architectures. For a hashed index
the overhead (with lf=1.0) should be between 8 and 12 bytes per element.
We are missing
something here. Can you provide more detailed info on how you're
measuring memory
consumption? Are there any aspect you might not be taking into account?

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net