Boost logo

Boost :

From: Daniel James (daniel_at_[hidden])
Date: 2005-02-20 13:53:28


JOAQUIN LOPEZ MU?Z wrote:
> The implementation I adopted (2 pointers per node + 2 pointers
> per bucket) has O(1) iteration, insert() and erase(), whatever
> the load conditions (provided the hash function behaves, of
> course.) I agree with you bidirectional iteration is of little
> value, and my main motivation was to achieve O(1). So, I'll be
> happy to go for a lighter implementation if these issues
> can be solved in another way --or if there's consensum that
> costly iteration under low load conditions is acceptable.

FWIW, if you can find a spare bit in a pointer, I've thought of a
non-portable way to get something closer to O(1) without the extra
memory load. The nodes are stored in a singly-linked list which includes
the buckets, so the last node in a bucket points to the next bucket. The
spare bit is used to indicate if a pointer is pointing to a node or a
bucket. So the data structure is something like:

struct node_or_bucket {
    node_or_bucket* next_;

    void set(node* x) { next_ = x; }
    void set(bucket* x) { next_ = x | 1; }
    bool is_next_bucket() const { return x & 1; }
    node_or_bucket* get() const { return x & -2; }
};

struct node : node_or_bucket {
    T value_;
};

struct bucket : node_or_bucket {
};

bucket* buckets_; // Array of buckets.
bucket* start_; // First bucket.

You can tell when a bucket ends because the pointer will be to a bucket
(or perhaps null, although a circular list would avoid that).

This also means that the node that comes before any node is always from
the same bucket, so erase only has to look at a single bucket.
Unfortunately, this can leave an empty bucket in the linked list.

As more empty buckets are left iteration gets slower, so they need to be
cleaned up. It would probably be possible for insert & erase to remove
any they chance across. And erase could do a complete clean up whenever
the number of empty buckets reaches a certain percentage of the current
size.

I'm not going to do this for now, as it wouldn't be remotely appropriate
for boost. And in many cases, this might end up slower than the current
implementation. I expect that the majority of accesses of an unordered
container are going to be by key.

Daniel


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk