Boost logo

Boost :

Subject: Re: [boost] [MultiIndex] Random Access mental model?
From: David Abrahams (dave_at_[hidden])
Date: 2008-10-31 19:47:59


on Fri Oct 31 2008, Joaquin M Lopez Munoz <joaquin-AT-tid.es> wrote:

> David Abrahams <dave <at> boostpro.com> writes:
>>
>> Hi,
>
> Hi Dave, allow me to address this post of yours first and defer
> the answer to the others you've send on Boost.MultiIndex, as
> those will need more elaboration.

Thanks. This is the most important one.

>> I'm trying to get a grip on what it means when I have a random access
>> index in a multi_index_container. I have no mental model for what data
>> structure you're actually creating, which makes it difficult to evaluate
>> multi-index vs. hand-rolling my own container.
>>
>> If I was doing this myself, I'd use a combination of these two:
>>
>> hash_map<key, value>
>> deque<hash_map<key, value>::pointer>
>>
>> because I need to maintain an LRU cache of what has been put in the map
>> (i.e. I need O(1) push_back and pop_front). MultiIndex only lets me use
>> sequenced<> which it implies essentially has the properties of
>> list<hash_map<key, value>::iterator>... although I can imagine you could
>> save a lot by using the same nodes for the hash_map and the list. Do
>> you do that?
>
> Yep, this is done. Only one node per element is allocated. This is
> true regardless of the indices used, including random
> access indices. If you want to visualize this, each element node
> has as many node headers as indices are, each one stacked on top
> of the next.

I think random access shouldn't need to add a node header unless you
want to do something like look up a key in the hash index and delete
that element. Maybe a feature-model interface would be a more powerful
way to approach this space, so we don't have to pay for what we don't
use.

> Some indices (namely, hashed and random access) use additional data
> structures external to the nodes themselves (this is obvious in the
> case of hashed indices, as they are entirely similar to
> unordered_set).

Yes, and I expect random access indices to do the same.

>> Regardless, I could still save roughly one pointer per node with my
>> deque. I'd think about trying to modify MultiIndex to let it do random
>> access with deques, but I can't make head or tail of the implementation
>> code used for random access.
>
> The data structure of a random access index is basically the
> same as depicted for a little container I call stable_vector:
>
> http://bannalia.blogspot.com/2008/08/stable-vectors.html

That's what I figured... although the back-pointers are only needed if
you need to delete and you can't get ahold of an iterator into the
vector -- the same case I mentioned above.

> So, each element is stored in its own node (as is been already
> said) and there is an additionally internal vector of pointers
> that keeps the elements together while providing O(1)
> access. The up-pointers described in the aforementioned article
> are necessary to provide stable iterators.
>
> The overhead of a random access index is one pointer per
> node (the up-pointer) plus the size of the internal pointer
> vector. Some figures are calculated at:
>
> http://bannalia.blogspot.com/2008/09/introducing-stablevector.html
>
> The deque-based structure you propose would save, as you
> say, roughly one pointer per node with respect to sequenced
> indices and also with respect to random access indices,
> which have the up-pointer your proposed data structure lacks,
> while on the other hand it requires an extra pointer as the deque
> stores pointers while multi_index_containers enjoy the header
> stacking property referred to above.
>
> So, roughly speaking, things are even.

Sorry, I don't see it. Since I need O(1) pop-front I am using
sequenced<> which IIUC is a doubly-linked list. Therefore your nodes
have this overhead:

  next-in-list, prev-in-list, and next-in-hash-bucket = 3 pointers

and mine have only next-in-hash-bucket. Add one pointer of overhead in
the deque and I am storing 2 pointers.

> On the other hand, this up-pointer is essential to ensure
> iterator stability and iterator projection, which are features
> that currently the lib take for granted for all the different
> index types available.

In my application, the ability to erase nodes through hash iterators
doesn't matter, but the per-node storage cost does.

>> Could someone clarify what's going on in there?
>
>
> Hope the answer helped clarify the issue. Please come back otherwise.

Except for the fact that we are coming up with different numbers for
the data structure overhead, yeah.

Thanks,

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk