Boost logo

Boost :

Subject: Re: [boost] [MultiIndex] Random Access mental model?
From: David Abrahams (dave_at_[hidden])
Date: 2008-11-01 13:46:47


on Sat Nov 01 2008, "JOAQUIN M. LOPEZ MUÑOZ" <joaquin-AT-tid.es> wrote:

> ________________________________________
> De: boost-bounces_at_[hidden] [boost-bounces_at_[hidden]] En nombre de David
> Abrahams [dave_at_[hidden]]
> Enviado el: sábado, 01 de noviembre de 2008 0:47
> Para: boost_at_[hidden]
> Asunto: Re: [boost] [MultiIndex] Random Access mental model?
>
> on Fri Oct 31 2008, Joaquin M Lopez Munoz <joaquin-AT-tid.es> wrote:
>> > Yep, this is done. Only one node per element is allocated. This is
>> > true regardless of the indices used, including random
>> > access indices. If you want to visualize this, each element node
>> > has as many node headers as indices are, each one stacked on top
>> > of the next.
>>
>> I think random access shouldn't need to add a node header unless you
>> want to do something like look up a key in the hash index and delete
>> that element. Maybe a feature-model interface would be a more powerful
>> way to approach this space, so we don't have to pay for what we don't
>> use.
>
> It is very difficult (I suspect impossible) to come up with a satisfactory global
> picture without this up-pointer. Consider just the following scenario:
>
> I have a multi_index_container<hashed,random_access> m and an
> iterator to the hashed index it. m.erase(it) cannot be implemented
> without the up-pointer in the random-access index. (If you don't
> immediately realize why, picture yourself the hand-rolled composition
> of a hashed_map and a deque<pointer>: you can't erase an element
> from the combined structure given only a hash_map::iterator).

Yes, yes, I understood all that. It's exactly what I was referring to
when I wrote "look up a key in the hash index and delete that element."
If you don't immediately see why that is the same thing you're talking
about, well, please tell me.

> There are lots of additional problems,

I disagree with your premise. It's only a problem if it's a problem for
me, and since I don't need that functionality I can use a lighter-weight
data structure with no up-pointers, just like I'll use a vector instead
of a deque when I don't need O(1) modification at the front.

> but I think this is sufficiently
> serious to justify my decision. Basically, Boost.MultiIndex assumes
> O(1) inter-index iterator projection (getting from an iterator in
> an index to an iterator in another index pointing to the same element,
> in constant time).

That's fine; it's not the right library for my application, then.

> [...]
>> > The deque-based structure you propose would save, as you
>> > say, roughly one pointer per node with respect to sequenced
>> > indices and also with respect to random access indices,
>> > which have the up-pointer your proposed data structure lacks,
>> > while on the other hand it requires an extra pointer as the deque
>> > stores pointers while multi_index_containers enjoy the header
>> > stacking property referred to above.
>> >
>> > So, roughly speaking, things are even.
>>
>> Sorry, I don't see it. Since I need O(1) pop-front I am using
>> sequenced<> which IIUC is a doubly-linked list. Therefore your nodes
>> have this overhead:
>>
>> next-in-list, prev-in-list, and next-in-hash-bucket = 3 pointers
>>
>> and mine have only next-in-hash-bucket. Add one pointer of overhead in
>> the deque and I am storing 2 pointers.
>
> But your deque is a deque of *pointers*,

Uh, yes. One pointer stored in the deque per element.

> so the (amortized) penalty of the deque is 2 pointers per element.

Maybe you should explain why you think so.

> Please take a look at the attached
> figure, where the resulting data structures are depicted for:
>
> * Boost.MultiIndex (hashed + sequenced),
> * Boost.MultiIndex (hashed + random access),
> * Hand-rolled composition (hash_map + deque).

That's not the data structure I was proposing. The deque elements would
point directly to the elements in the hash_map, i.e.

      hash_map<key, value> + deque<std::pair<key const, value>*>

The picture you drew is something like

      hash_map<key, value> + deque<std::pair<key const, value>**>

I can't imagine why you added that extra node.

> (The array labeled "deque<pointer>" does not do merit to the actual
> chunk-based structure used by most deque implementations, but you get
> the idea, there's on average one pointer per element).

That would only be true if the block size of the deque was 1, but who
would build a deque that way?

I'm very familiar with deque. There basically are two implementations,
one of which uses a circular buffer for the array of block pointers, but
fundamentally they have the same storage overhead. If there are N
elements and a block can hold B elements the block array stores
cieling(N/B) pointers and there are at most N*floor(N/B) + 2*B *
sizeof(element) bytes of storage for blocks. As N grows, the 2*B
constant becomes unimportant and you N have amortized N *sizeof(element)
bytes of element storage and (N/B)*sizeof(block*) bytes of block
storage. When the element is a pointer that's (N+N/B)*sizeof(pointer).
The only way that can be equal to N*2*sizeof(pointer) is if B==1.

> Do you see you've got four pointers (black dots) per element in the
> three structures?

Yes, but the deque you picture is pathologically inefficient.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk