Boost logo

Boost :

From: Jeremy Bruestle (jeremy.bruestle_at_[hidden])
Date: 2007-08-07 00:59:59


Greetings,

I've recently implemented a new collection type for a project of mine.
 It works identically to std::map, with the same algorithmic
guarantees, but additionally can be 'copied' in constant time. That
is:

shared_map<int, int> a;
for(i = 0; i < 100; i++)
   a[i] = i*i; // Each insert in log(n) of the map size
shared_map<int, int> b = a; // This is constant time (not linear as
in a std::map)
b[1] = 7; // Still log(n) of course
b[2] = 3;

assert(a[2] == 4 && b[2] == 3 && a[6] == 36 && b[6] == 36);

The trick is simple: the map is a red-black tree, but we get rid of
the explicit parent pointer for each red-black node. This complicates
the code, but is algorithmically uninteresting since we always descend
from the root anyway and just need to track the parent as we go,
either in an explicit stack or through recursion (my current
implementation). The key point however is that the lack of an
explicit parent allows us to use reference counting, and share
children. Thus the copy from a to b simply copies the root node, and
adds an additional reference count. All future changes do copy on
write for shared nodes (any with a ref count of 2 or more).

The does lower the overall performance for all operations somewhat,
but for cases like mine where there is a need to copy large maps a
lot, and each map remains quite similar to the other maps, there is a
huge savings in computation and memory usage. Another nice use is:
copy a map (since it's free) and iterate over it while modifying the
original with no worries.

One caveat, obviously multiple threads require a single global lock
since even apparently different collections share internal state (or I
could add a per node lock, which has the nice effect of general fine
grained tree locking, but is tons of overhead, perhaps a compile time
choice?).

My current implementation is sufficient for my purposes, but with a
little cleanup (making stronger iterator validity guarantees, better
support for erasing ranges, explicit parent stack instead of recursion
perhaps), it could be perhaps useful to everyone. Also, it's a little
work but straightforward to generalize to shared_set, shared_multimap,
etc.

I was wondering if there would be any interest from people in such a
library? If so, I would be willing to do the needed cleanup, if not,
it suits me reasonably well as is (I don't even use erase). Also
curious if anyone has seen such a thing anywhere before. It seems
obvious enough to me, but perhaps it's actually new, I couldn't find
anything like it when I went searching.

Thanks for your time

-Jeremy


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk