Boost logo

Boost :

Subject: Re: [boost] [stldb] design details (was Boost library submission (poll for interest))
From: Stefan Strasser (strasser_at_[hidden])
Date: 2010-01-07 07:13:02


thanks for the detailed response.

Am Thursday 07 January 2010 18:36:11 schrieb Bob Walters:
> A big contrast between this STLdb library and Boost.Persistence is that in
> your case, IIUC you are tracking the use and modification of objects but
> the objects themselves are unaware. One downside of this is that the
> objects can't collaborate to support piecemeal updates of small portion of
> the objects - it's always the full object which is re-serialized.

right. that's why we need both libraries.

> On Tue, 5 Jan 2010 16:54:52 +0100 Stefan Strasser <strasser_at_[hidden]>
wrote:
> > I thought the reason that requires you to load the entire dataset on
> > startup and save it entirely at checkpoints and recoveries is that the
> > user needs to access the mapped region directly through pointers, since
> > the user types exist in an unserialized form there. is that right?
>
> IIUC by load you mean reconstruct the memory mapped file from the
> checkpoints + logs?

that's what I meant above, yes, but what I really mean is any kind of
operation that is linear to the database size instead of the working set.
I thought this was necessary at startup (of the process that creates the
shared memory segment) by design. if I was wrong here, good.
but I still think that triggering such an operation by anything that happens
during the normal use of a database is near-catastrophic.
that imho includes a crashing application.

> IIUC you have a similar problem but it involves redundancy of
> memory. If you work with a large intrusive container, then you have the
> serialized storage memory mapped in, but must also cache the deserialized
> nodes of the container in memory to ensure fast locator traversal, and
> allow it to work like an index.

yes

> What I am currently very intriguied by is
> your idea of using STLdb as indexes for a large database. i.e. The
> containers are map<key,locator>, and refer to a separate Boost.Persistence
> stored objects.

me too.
if we reach that point, automatic global indexes like you'd expect from a
relational database could be implemented pretty easily:

struct pers_type{
  string name;
  template<class Archive>
  void serialize(Archive &ar,unsigned int){
    ar & persistent::index<name_tag>(name);
  }
};

voila, you can look up pers_type's by name.

there are two elements to store a locator:
 - the object id, which is specified to be POD.
 - a tag representing the resource the object is stored in.
in the case of storing a locator in a map, the tag would probably be stored as
an index into a list of resources to make the map element fixed size.

> correctness when accessing locators? I really like your interface in
> Boost.Persistent, but are you currently assuming an update when a locator
> is accessed non-const, or is it more clever than that, i.e. making a
> transaction-local copy during non-const access, but later checking for
> differences between the transaction-local copy and original before
> reserializing and storing the object?

yes, dereferencing a non-const locator causes the resource to compare the
object on commit to determine if it has changed, either by calling a
user-supplied function or by serializing both the original and the clone and
comparing the serialized streams for equality.
not doing that would seriously harm concurrency in a lot of cases (besides
making unnecessary updates).
Locator::write() can be used to indicate an object will definitely be modified
and avoid the above for performance.

I have to add a chapter about dereferencing, read() and write() and their
performance implications to the documentation.

>
> > what kind of tree does trans_map use internally?
>
> Red/Black per boost::interprocess::map, which is used (intact) internally
> for the low-level map implementation. I'm wrapping that with code to make
> it transactionally aware. I didn't want yet another implementation of map.

I implemented a B-tree once, I don't remember the quality of the code, and
it's C++/CLI but if you want it let me know...

> > does the library automatically
> > record/lock accesses? is the user supposed to lock manually on every
> > access?
>
> The isolation behavior is based on RDBMS conventions for row-level locking.
> Reads do not lock entries, writes do lock entries. These entry-level

as far as I know, MySQL InnoDB does acquire a shared lock on reads.

> Incidentally, one problem I have with Boost.Persistence is that you always
> handle contention by throwing. IIUC contention can occur when two threads
> insert records with different keys into a container simply because they
> would both add their node under the same parent, and despite their nodes
> having different keys. So in other words, there's nothing apps can do to
> prevent the possibility of contention errors.

I have thought about implementing an object resource that uses locking instead
of MVCC (this is a limitation of the current implementation only, not the
design/interface) but while that would improve performance in some situations
(especially when there are few conflicts) I don't understand why that would
change the likelyhood of conflicts or how an application could avoid
conflicts.
in your example of two nodes added under the same parent node, both
transactions change the parent node, so there is a conflict if you use MVCC
or locking.
are you talking about the cost of repeating a transaction after a MVCC
conflict if that transaction is large and contains many other operations?
do you see a way around that?
except implementing a locking object resource, which is possible without
changing any of the public interface.

btw, my "Intrusive Persistent Object Containers" are not meant to be indexes.
I intentionally removed the non-intrusive versions I already had implemented,
because I don't want people to think these should be used e.g. as
map<int,int>.

if you already have a bunch of persistent objects anyway, and you want to
arange them in a list, in a tree, you can use those containers to do the work
for you. but they are not meant to "index" an int.

> btw... I'm still not settled on the interface for lock() / update() and
> have wondered if I should somehow put that onto the iterator. E.g. lock()
> would be a method of the iterator, and update() could be a method on the
> iterator which returns a non-const L-value reference that can then be
> updated directly. Suggestions welcome...

I'm not experienced enough with the row locking of RDBMS to make any
suggestions here.
in general, functions modifying a container are not part of the iterator in
STL, but part of the container with an iterator as argument.
(there is no Iterator::erase()).

>
> > how are possible read-accesses over an entire range stored, e.g. by
> > using map::equal_range()?
>
> If you hold a shared lock on the map while accessing it via a generic
> algorithm you are guaranteed to see the data reflecting a consistent set of
> transactions. In between locks, the result of new commits can appear,
> leading to the various phantom scenarios defined in the ANSI isolation
> levels.

how do you make sure the following transaction fails?
(assume an isolation level here that does not allow the transaction to
succeed. even if your current container doesn't support full isolation, it
should be possible to implement one, with the same interface, right?)

map<int,int> numbers;
map<int,int> numbersX2;

numbersX2[x] is guaranteed to be numbers[x] * 2, whenever a transaction
changes one index, it also changes the other.

now a transaction A comes along that reads an element of "numbers", acts on
it, then transaction B updates both containers, then transaction A reads a
the same element of numbersX2 and acts on it.

the state presented to transaction A was inconsistent. how is transaction A
prevented from committing?
do I need to know that transaction A accesses both containers when I implement
transaction A and lock both containers beforehand?


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk