Boost logo

Boost :

From: Wesley W. Terpstra (terpstra_at_[hidden])
Date: 2002-11-12 13:57:44


First off, let me appologize; I should have given way more context in my
original post. boost members are not psychic. :-)

On Tue, Nov 12, 2002 at 05:31:46PM +0200, Bohdan wrote:
>
> "Wesley W. Terpstra" <terpstra_at_[hidden]> wrote in message
> news:20021112113226.GA466_at_ito.tu-darmstadt.de...
> > Good afternoon!
> >
> > I am looking at making an stl-compatible wrapper around a key-value database.
> >
> > It seems to me that such a wrapper would be widely useful since:
> > 1. stdc++ algorithms could operate on the databases
> > 2. switching a map<...> that grew too large to disk-backed becomes trivial
> > 3. old reliable stl code could be reused on disk
> > 4. a very gentle learning curve to existing C++ developers
> > 5. it would be highly convenient to use
> > 6. quite likely clever (ab)uses which I do not foresee would be possible

I am answering your message out of sequence because I think it will be more
clear.

> The other idea is that transaction object is needed here.

Yes, I already have that solved:

You get a transaction object from the environment.
You get a database object from the environment.
You combine the two to get a session object.
The session looks like an stl map (it issues iterators).
When you commit the transaction all the sessions using that transaction are
invalidated as are all the iterators that they issued.

... the problem I have is the calling of member methods through the
iterator. I know I could do it with a several approaches, I am trying
to find the most elegant with minimal overhead.

> ObjectDatabases are very painful things. Unfortunately
> they are not too popular nowadays. The reason for
> this is simple, they are extrimally difficult to implement
> and use (at least for c++).

Yes they are. Fortunately, I am not trying to write one.
I have a *very* narrow functionality target.

When you put it in the container, then it is on disk and in my control.
When you take it out, it is not.

> > Obviously, any scheme like this would require serialisation of
> > the key/data pairs. My solution thus far has been to include a
> > SerialTraits<T> concept which provides a conversion method. Then
> > the databases look like: MapDatabase<KeyTraits, DataTraits> db;
>
> You can use new boost::serialization library.

I have considered this and rejected it (the coupling, not boost::serialization).

I would rather leave this entirely in user control. It would be a matter of
adding a single template class if they wished to bridge the two themselves,
so there is no functionality lost.

Further, it is important to control the serialized format so that the
lexical sort order has desirable properties. This fine grained control is
not available under boost::serialization without writing a stream object for
each desired lexical-sort seriailization.

Also, this way I depend on less things.

> I have doubts if you can use std::map interface for your database class.

I agree that it is not designed for this purpose, but I think there are
many beneficial emergent properties. It does not actually have to exactly
conform; just conform with the subset used in existing practice.

I know from experience with a previous product that even something that
closely approximates a std::map is highly useful. I just want to bring that
approximation a bit closer so that code really can ignore the difference.

> > Although one might niavely claim mmap() could keep the representation of
> > RAM on the disk, I would disagree since this would still impose arcane
> > restrictions on the class member variables.
>
> Well, there are two ways:
> 1. You need disk to reduce memory usage.
> 2. You need disk to persist objects.
> I'm not sure which one is yours. Did you ?

Did I ?

I desire both of the two properties, you want me to choose? :-)
Is your comment here pertaining to mmap()?

> > This would unfortunately break any code which took a T& or T*
> > from an iterator and held on to it.
>
> If you want to allow pointers and use them after application restart
> than use smart pointers:
>
> I've heard something about some system/processor tricks which allow
> to persist pointers, but i do not think it is good way.

You are considering serialization. I have no interest in this topic.
My implementation presumes that you have picked one of the many available
serialization tools or rolled your own.

This is especially not-so-important because I do not plan on supporting
storing anything other than by-value. Further, since it looks like the stl
I know all the type information and there is no polymorphism.

The T& and T* I was refering to are those obtained from the map::iterator
class that is walking records. The user might dereference this iterator and
take a reference to the object. Rather than telling them to use a smart
pointer, better would be to say: just keep the iterator!

I am just concerned with breaking existing stl code.
If possible, I would rather this could work, but I don't see how.

> > I am duplicating the read cache of the database in a wrapper.
> > (on the plus side I am also saving deserialisation work)
>
> I'm not sure if you can live with read-only cache. When you are
> doing intensive changes than cache should also support changes.
> i do not see problems with it.

Sorry, I was not clear. Of course my cache is read-write, but I am reducing
the usefulness of the databases read cache.

> Imagine that serializing of some object can throw exeption ...

Again, serialization... Not my problem.
Even if it threw an exception, I do not think this would pose any problems;
your transaction should have been wrapped in a retry block. All the memory
handed out from my database is memory managed in a way resiliant to
exceptions (since I throw them too! deadlock for instance).

> > Solution #3: fuzzy template+inheritance tricks!
> >
> > I figure there might be some clever way to return an object which
> > looks like the data object, but really is not. Maybe by inheriting a
> > template class from the contained class. Returning these might be
> > able to do what is needed; eg: on destruction, write back to the
> > database library.
>
> ODMG proposes new/delete operators for persistent object creation/destruction.
> I this case you can construct your object is somewhat bigger memory chunk,
> which can contain some other implementation specific per-object information.

I do not think this will help; the returned object would be living on the
stack which puts control of it's new/delete out of my hands. However, with
my trick (deriving from the class) I can still catch the destruction even
though it is on the stack.

> > This seems like a good idea, but it is fraught with complications.
> > Consider two iterators i&j which (happen by chance to) point to the
> > same object.
> > i->set_a(j->set_b(4) + 2);
> >
> > Oops. You would expect both changes to work since they are
> > presumably modifying different member variables. However, since
> > i-> and j-> both read from disk and deserialised to a temporary,
> > we are modifying two different temporaries. Therefore only one of
> > the changes (whoever's object destroys last) will be made.
>
> If your class is going to have concurrent transactions, than your
> example shows how concurrency works.

These iterators are in the same transaction.

> if i & j belong to same transaction (or no transactions) than
> assert(&(*i)==&(*j) ). It can be implemented if MapDatabase::iterator is
> smart enough "smart pointer" :)

Well if it is a smart pointer than you are using solution #2: cache.

But this is solution #3, I was trying to avoid deserialized cache and only
use the database's underlying sector-oriented cache. So, the same object got
deserialized twice and returned by value on the stack twice.

> So if you want to hear my humble opinion:
> try to implement something small and limited in use first.

Well, this is my third attempt. :-)

I implemented a stl-like wrapper for libdb3 at a place I worked, but I could
not keep that code. It took approach "#1: don't do that!" to calling methods
through the iterator.

After that, I wrote a custom database without a nice api because I had a
very specific use which I could not get fast enough with btrees and hashes.
It was not transactional, however, and I later discovered an even better
layout (nearly 0 seeks!).

This newest attempt is a fully transaction database with several unusual
backing data structures. I have a nice raw iterator class that operates on
memory chunks for the key&data. It is mostly complete, but lacks a
user-friendly api.

I am trying to get it to look more like the stl because the wrapper I wrote
for my previous employer was invaluable. And all the reasons at the top.

Thanks for your time!


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk