Boost logo

Boost :

From: Hervé Brönnimann (hervebronnimann_at_[hidden])
Date: 2007-08-07 22:05:00


Jeremy: You might be interested in this paper which basically
describes (albeit from a different design point of view) your data
structure:

James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre
Tarjan: Making Data Structures Persistent. J. Comput. Syst. Sci. 38
(1): 86-124 (1989)

As far as the lock goes, why not a shared_ptr per node? (those use
atomic ref counters) As long as you do not maintain the size of the
map, that should be all good. You can maintain the size in every
node, and get the size of a collection from the root.

I've heard of shared DAG structures to represent hierarchical shared
collections, along the same principles. They work well, if a bit slow.

--Herve Bronnimann

On Aug 7, 2007, at 12:59 AM, Jeremy Bruestle wrote:

> Greetings,
>
> I've recently implemented a new collection type for a project of mine.
> It works identically to std::map, with the same algorithmic
> guarantees, but additionally can be 'copied' in constant time. That
> is:
>
> shared_map<int, int> a;
> for(i = 0; i < 100; i++)
> a[i] = i*i; // Each insert in log(n) of the map size
> shared_map<int, int> b = a; // This is constant time (not linear as
> in a std::map)
> b[1] = 7; // Still log(n) of course
> b[2] = 3;
>
> assert(a[2] == 4 && b[2] == 3 && a[6] == 36 && b[6] == 36);
>
> The trick is simple: the map is a red-black tree, but we get rid of
> the explicit parent pointer for each red-black node. This complicates
> the code, but is algorithmically uninteresting since we always descend
> from the root anyway and just need to track the parent as we go,
> either in an explicit stack or through recursion (my current
> implementation). The key point however is that the lack of an
> explicit parent allows us to use reference counting, and share
> children. Thus the copy from a to b simply copies the root node, and
> adds an additional reference count. All future changes do copy on
> write for shared nodes (any with a ref count of 2 or more).
>
> The does lower the overall performance for all operations somewhat,
> but for cases like mine where there is a need to copy large maps a
> lot, and each map remains quite similar to the other maps, there is a
> huge savings in computation and memory usage. Another nice use is:
> copy a map (since it's free) and iterate over it while modifying the
> original with no worries.
>
> One caveat, obviously multiple threads require a single global lock
> since even apparently different collections share internal state (or I
> could add a per node lock, which has the nice effect of general fine
> grained tree locking, but is tons of overhead, perhaps a compile time
> choice?).
>
> My current implementation is sufficient for my purposes, but with a
> little cleanup (making stronger iterator validity guarantees, better
> support for erasing ranges, explicit parent stack instead of recursion
> perhaps), it could be perhaps useful to everyone. Also, it's a little
> work but straightforward to generalize to shared_set, shared_multimap,
> etc.
>
> I was wondering if there would be any interest from people in such a
> library? If so, I would be willing to do the needed cleanup, if not,
> it suits me reasonably well as is (I don't even use erase). Also
> curious if anyone has seen such a thing anywhere before. It seems
> obvious enough to me, but perhaps it's actually new, I couldn't find
> anything like it when I went searching.
>
> Thanks for your time
>
> -Jeremy
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/
> listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk