Boost logo

Boost-Commit :

From: eric_at_[hidden]
Date: 2008-06-12 19:50:13


Author: eric_niebler
Date: 2008-06-12 19:50:13 EDT (Thu, 12 Jun 2008)
New Revision: 46363
URL: http://svn.boost.org/trac/boost/changeset/46363

Log:
add implementation notes appendix, document tracking_ptr
Added:
   trunk/libs/xpressive/doc/tracking_ptr.qbk (contents, props changed)
Text files modified:
   trunk/libs/xpressive/doc/xpressive.qbk | 6 ++++++
   1 files changed, 6 insertions(+), 0 deletions(-)

Added: trunk/libs/xpressive/doc/tracking_ptr.qbk
==============================================================================
--- (empty file)
+++ trunk/libs/xpressive/doc/tracking_ptr.qbk 2008-06-12 19:50:13 EDT (Thu, 12 Jun 2008)
@@ -0,0 +1,172 @@
+[/
+ / Copyright (c) 2008 Eric Niebler
+ /
+ / Distributed under the Boost Software License, Version 1.0. (See accompanying
+ / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
+ /]
+
+[section Cycle collection with [^tracking_ptr<>]]
+
+In xpressive, regex objects can refer to each other and themselves by value or by reference.
+In addition, they ref-count their referenced regexes to keep them alive. This creates the possibility
+for cyclic reference counts, and raises the possibility of memory leaks. xpressive avoids leaks
+by using a type called `tracking_ptr<>`. This doc describes at a high level how `tracking_ptr<>`
+works.
+
+[h2 Constraints]
+
+Our solution must meet the following design constraints:
+
+* No dangling references: All objects referred to directly or indirectly must be kept alive
+ as long as the references are needed.
+* No leaks: all objects must be freed eventually.
+* No user intervention: The solution must not require users to explicitly invoke some cycle
+ collection routine.
+* Clean-up is no-throw: The collection phase will likely be called from a destructor, so it
+ must never throw an exception under any circumstance.
+
+[h2 Handle-Body Idiom]
+
+To use `tracking_ptr<>`, you must separate your type into a handle and a body. In the case of
+xpressive, the handle type is called basic_regex<> and the body is called regex_impl<>. The
+handle will store a `tracking_ptr<>` to the body.
+
+The body type must inherit from enable_reference_tracking<>. This gives the body the bookkeeping
+data structures that `tracking_ptr<>` will use. In particular, it gives the body:
+
+# `std::set<shared_ptr<body> > refs_` : collection of bodies to which this body refers, and
+# `std::set<weak_ptr<body> > deps_` : collection of bodies which refer to this body.
+
+[h2 References and Dependencies]
+
+We refer to (1) above as the "references" and (2) as the "dependencies". It is crucial to the
+understanding of `tracking_ptr<>` to recognize that the set of references includes both those
+objects that are referred to directly as well as those that are referred to indirectly (that
+is, through another reference). The same is true for the set of dependencies. In other words,
+each body holds a ref-count directly to every other body that it needs.
+
+Why is this important? Because it means that when a body no longer has a handle referring
+to it, all its references can be released immediately without fear of creating dangling references.
+
+References and dependencies cross-pollinate. Here's how it works:
+
+# When one object acquires another as a reference, the second object acquires the first as
+ a dependency.
+# In addition, the first object acquires all of the second object's references, and the second
+ object acquires all of the first object's dependencies.
+# When an object picks up a new reference, the reference is also added to all dependent objects.
+# When an object picks up a new dependency, the dependency is also added to all referenced
+ objects.
+# An object is never allowed to have itself as a dependency. Objects may have themselves as
+ references, and often do.
+
+Consider the following code:
+
+ sregex expr;
+ {
+ sregex group = '(' >> by_ref(expr) >> ')'; // (1)
+ sregex fact = +_d | group; // (2)
+ sregex term = fact >> *(('*' >> fact) | ('/' >> fact)); // (3)
+ expr = term >> *(('+' >> term) | ('-' >> term)); // (4)
+ } // (5)
+
+Here is how the references and dependencies propagate, line by line:
+
+[table
+[[Expression][Effects]]
+[[1) `sregex group = '(' >> by_ref(expr) >> ')';`]
+[[^group: cnt=1 refs={expr} deps={}\n
+expr: cnt=2 refs={} deps={group}]]]
+
+[[2) `sregex fact = +_d | group;`]
+[[^group: cnt=2 refs={expr} deps={fact}\n
+expr: cnt=3 refs={} deps={group,fact}\n
+fact: cnt=1 refs={expr,group} deps={}]]]
+
+[[3) `sregex term = fact >> *(('*' >> fact) | ('/' >> fact));`]
+[[^group: cnt=3 refs={expr} deps={fact,term}\n
+expr: cnt=4 refs={} deps={group,fact,term}\n
+fact: cnt=2 refs={expr,group} deps={term}\n
+term: cnt=1 refs={expr,group,fact} deps={}]]]
+
+[[4) `expr = term >> *(('+' >> term) | ('-' >> term));`]
+[[^group: cnt=5 refs={expr,group,fact,term} deps={expr,fact,term}\n
+expr: cnt=5 refs={expr,group,fact,term} deps={group,fact,term}\n
+fact: cnt=5 refs={expr,group,fact,term} deps={expr,group,term}\n
+term: cnt=5 refs={expr,group,fact,term} deps={expr,group,fact}]]]
+
+[[5) `}`]
+[[^expr: cnt=2 refs={expr,group,fact,term} deps={group,fact,term}]]]
+]
+
+This shows how references and dependencies propagate when creating cycles of objects. After
+line (4), which closes the cycle, every object has a ref-count on every other object, even
+to itself. So how does this not leak? Read on.
+
+[h2 Cycle Breaking]
+
+Now that the bodies have their sets of references and dependencies, the hard part is done.
+All that remains is to decide when and where to break the cycle. That is the job of `tracking_ptr<>`,
+which is part of the handle. The `tracking_ptr<>` holds 2 `shared_ptr`s. The first, obviously,
+is the `shared_ptr<body>` -- the reference to the body to which this handle refers. The other
+`shared_ptr` is used to break the cycle. It ensures that when all the handles to a body go out
+of scope, the body's set of references is cleared.
+
+This suggests that more than one handle can refer to a body. In fact, `tracking_ptr<>` gives
+you copy-on-write semantics -- when you copy a handle, the body is shared. That makes copies
+very efficient. Eventually, all the handles to a particular body go out of scope. When that
+happens, the ref count to the body might still be greater than 0 because some other body (or
+this body itself!) might be holding a reference to it. However, we are certain that the cycle-breaker's
+ref-count goes to 0 because the cycle-breaker only lives in handles. No more handles, no more
+cycle-breakers.
+
+What does the cycle-breaker do? Recall that the body has a set of references of type
+`std::set<shared_ptr<body> >`. Let's call this type "references_type". The cycle-breaker is a
+`shared_ptr<references_type>`. It uses a custom deleter, which is defined as follows:
+
+ template<typename DerivedT>
+ struct reference_deleter
+ {
+ void operator ()(std::set<shared_ptr<DerivedT> > *refs) const
+ {
+ refs->clear();
+ }
+ };
+
+The job of to the cycle breaker is to ensure that when the last handle to a body goes away,
+the body's set of references is cleared. That's it.
+
+We can clearly see how this guarantees that all bodies are cleaned up eventually. Once every
+handle has gone out of scope, all the bodies' sets of references will be cleared, leaving none
+with a non-zero ref-count. No leaks, guaranteed.
+
+It's a bit harder to see how this guarantees no dangling references. Imagine that there are 3
+bodies: A, B and C. A refers to B which refers to C. Now all the handles to B go out of scope,
+so B's set of references is cleared. Doesn't this mean that C gets deleted, even though it
+is being used (indirectly) by A? It doesn't. This situation can never occur because we propagated
+the references and dependencies above such that A will be holding a reference directly to C
+in addition to B. When B's set of references is cleared, no bodies get deleted, because they
+are all still in use by A.
+
+[h2 Future Work]
+
+All these `std::set`s and `shared_ptr`s and `weak_ptr`s! Very inefficient. I used them because
+they were handy. I could probably do better.
+
+Also, some objects stick around longer than they need to. Consider:
+
+ sregex b;
+ {
+ sregex a = _;
+ b = by_ref(a);
+ b = _;
+ }
+ // a is still alive here!
+
+Due to the way references and dependencies are propagated, the `std::set` of references can only
+grow. It never shrinks, even when some references are no longer needed. For xpressive this
+isn't an issue. The graphs of referential objects generally stay small and isolated. If someone
+were to try to use `tracking_ptr<>` as a general ref-count-cycle-collection mechanism, this problem
+would have to be addressed.
+
+[endsect]

Modified: trunk/libs/xpressive/doc/xpressive.qbk
==============================================================================
--- trunk/libs/xpressive/doc/xpressive.qbk (original)
+++ trunk/libs/xpressive/doc/xpressive.qbk 2008-06-12 19:50:13 EDT (Thu, 12 Jun 2008)
@@ -122,5 +122,11 @@
 
 [include perf.qbk]
 
+[section Appendix 5: Implementation Notes]
+
+[include tracking_ptr.qbk]
+
+[endsect]
+
 [endsect]
 


Boost-Commit list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk