Boost logo

Boost :

From: Jeremy Bruestle (jeremy.bruestle_at_[hidden])
Date: 2007-08-08 12:01:24


Thanks for the reference, the paper is very interesting. I'm still
reading it, but it's very applicable, and I might get some good
algorithmic improvements out of it. Also, for concurrency, the use of
shared_ptr makes good sense in terms of code reuse. All of this
discussion has gotten me thinking about the details of my structure
more (for example, with the basic design as it is, no matter what I
do, begin is always going to be log(n) since of course the returning
of the iterator requires a copy, which is log(n)). I think I will
churn on it for a while, and once I've cleaned it up come back with an
explict list of the algorithmic properties to reduce any confusion (my
statement about the bounds being the same as map for example is true
of insert, find, and delete, but not of many of the iterator uses).

If anyone has more question though, or has ideas, or just wants to
express interest, please do so. At this point I think I'm going to
keep working on the project even if no one is interested, because I
see of lot of fun problems to attack, but I'm still curious if anyone
else has use for such a structure. Thanks to all so far for the good
feedback.

-Jeremy

On 8/7/07, Hervé Brönnimann <hervebronnimann_at_[hidden]> wrote:
> Jeremy: You might be interested in this paper which basically
> describes (albeit from a different design point of view) your data
> structure:
>
> James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre
> Tarjan: Making Data Structures Persistent. J. Comput. Syst. Sci. 38
> (1): 86-124 (1989)
>
> As far as the lock goes, why not a shared_ptr per node? (those use
> atomic ref counters) As long as you do not maintain the size of the
> map, that should be all good. You can maintain the size in every
> node, and get the size of a collection from the root.
>
> I've heard of shared DAG structures to represent hierarchical shared
> collections, along the same principles. They work well, if a bit slow.
>
> --Herve Bronnimann
>
> On Aug 7, 2007, at 12:59 AM, Jeremy Bruestle wrote:
>
> > Greetings,
> >
> > I've recently implemented a new collection type for a project of mine.
> > It works identically to std::map, with the same algorithmic
> > guarantees, but additionally can be 'copied' in constant time. That
> > is:
> >
> > shared_map<int, int> a;
> > for(i = 0; i < 100; i++)
> > a[i] = i*i; // Each insert in log(n) of the map size
> > shared_map<int, int> b = a; // This is constant time (not linear as
> > in a std::map)
> > b[1] = 7; // Still log(n) of course
> > b[2] = 3;
> >
> > assert(a[2] == 4 && b[2] == 3 && a[6] == 36 && b[6] == 36);
> >
> > The trick is simple: the map is a red-black tree, but we get rid of
> > the explicit parent pointer for each red-black node. This complicates
> > the code, but is algorithmically uninteresting since we always descend
> > from the root anyway and just need to track the parent as we go,
> > either in an explicit stack or through recursion (my current
> > implementation). The key point however is that the lack of an
> > explicit parent allows us to use reference counting, and share
> > children. Thus the copy from a to b simply copies the root node, and
> > adds an additional reference count. All future changes do copy on
> > write for shared nodes (any with a ref count of 2 or more).
> >
> > The does lower the overall performance for all operations somewhat,
> > but for cases like mine where there is a need to copy large maps a
> > lot, and each map remains quite similar to the other maps, there is a
> > huge savings in computation and memory usage. Another nice use is:
> > copy a map (since it's free) and iterate over it while modifying the
> > original with no worries.
> >
> > One caveat, obviously multiple threads require a single global lock
> > since even apparently different collections share internal state (or I
> > could add a per node lock, which has the nice effect of general fine
> > grained tree locking, but is tons of overhead, perhaps a compile time
> > choice?).
> >
> > My current implementation is sufficient for my purposes, but with a
> > little cleanup (making stronger iterator validity guarantees, better
> > support for erasing ranges, explicit parent stack instead of recursion
> > perhaps), it could be perhaps useful to everyone. Also, it's a little
> > work but straightforward to generalize to shared_set, shared_multimap,
> > etc.
> >
> > I was wondering if there would be any interest from people in such a
> > library? If so, I would be willing to do the needed cleanup, if not,
> > it suits me reasonably well as is (I don't even use erase). Also
> > curious if anyone has seen such a thing anywhere before. It seems
> > obvious enough to me, but perhaps it's actually new, I couldn't find
> > anything like it when I went searching.
> >
> > Thanks for your time
> >
> > -Jeremy
> > _______________________________________________
> > Unsubscribe & other changes: http://lists.boost.org/mailman/
> > listinfo.cgi/boost
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk