Boost logo

Boost :

From: Bohdan (warever_at_[hidden])
Date: 2002-11-12 10:31:46


"Wesley W. Terpstra" <terpstra_at_[hidden]> wrote in message
news:20021112113226.GA466_at_ito.tu-darmstadt.de...
> Good afternoon!
>
> I am looking at making an stl-compatible wrapper around a key-value database.
>
> It seems to me that such a wrapper would be widely useful since:
> 1. stdc++ algorithms could operate on the databases
> 2. switching a map<...> that grew too large to disk-backed becomes trivial
> 3. old reliable stl code could be reused on disk
> 4. a very gentle learning curve to existing C++ developers
> 5. it would be highly convenient to use
> 6. quite likely clever (ab)uses which I do not foresee would be possible
>
> Obviously, any scheme like this would require serialisation of the key/data
> pairs. My solution thus far has been to include a SerialTraits<T> concept
> which provides a conversion method. Then the databases look like:
> MapDatabase<KeyTraits, DataTraits> db;

You can use new boost::serialization library.

> where the KeyTraits include the typename of the Key, and the serialisation
> methods. My comparison is always the lexical comparison on the serialised
> object.
>

It looks like your solution somehow correlates with recently proposed Ditto
library.
BTW, If you are looking for a reasonable solutions on object databases you can
look at www.odmg.org. But you have to buy book to read about c++ ODMG
interface.

> Things have been going surprisingly well, but I have a problem that comes
> from the serialisation: References and members.
>
> map[key].non_const_member_fn();
> (*i).any_member_fn();
>
> The stl (rightly) assumes all the objects are in their usual representation
> in RAM. Therefore, you can call const methods on them and if they are
> mutable you can call non-const methods.
>
> This is a disaster.

I have doubts if you can use std::map interface for your database class.

>
> Although one might niavely claim mmap() could keep the representation of RAM
> on the disk, I would disagree since this would still impose arcane
> restrictions on the class member variables.

Well, there are two ways:
   1. You need disk to reduce memory usage.
   2. You need disk to persist objects.
I'm not sure which one is yours. Did you ?

> I have considered several solutions none of which I consider fully adequate.
>
> Solution #1: don't do that!
>
> To fix (*i) one could use that the specification merely says that *i
> be convertible to T and assignable. Therefore I could return a proxy
> object which serialised on assignment and deserialised on conversion.
>
> Unfortunately, many legacy programs do (*i).fn() since i->fn() was
> unreliable in compilers. This will not work with a proxy object.
>
> Further, i->fn() is impossible.
>
> Solution #2: cache it!
>
> I have also considered deserialising an object once, allowing
> modifications, etc. Then on commit reserialising.
>
> This would work out ok, except that it introduces baggage:
>
> I can't hold on to all the records that have been read since the
> user might be touching more disk than RAM.

If you are working intensively with all records ( objects ) than
there are no other way but to cache them all
( if cache size allows to do so ).

> Therefore, I would have
> to do some sort of reference counting in my iterators.

Also you can limit cache size, but ref-counting will be still useful here.

>
> This would unfortunately break any code which took a T& or T*
> from an iterator and held on to it.

If you want to allow pointers and use them after application restart
than use smart pointers:
<code>
class Node
{
    Value val;
    odb_ptr<Node> next;
};
<\code>

I've heard something about some system/processor tricks which allow
to persist pointers, but i do not think it is good way.

>
> I am duplicating the read cache of the database in a wrapper.
> (on the plus side I am also saving deserialisation work)

I'm not sure if you can live with read-only cache. When you are
doing intensive changes than cache should also support changes.
i do not see problems with it.
The other idea is that transaction object is needed here .
Imagine that serializing of some object can throw exeption ...

>
> Solution #3: fuzzy template+inheritance tricks!
>
> I figure there might be some clever way to return an object which
> looks like the data object, but really is not. Maybe by inheriting a
> template class from the contained class. Returning these might be
> able to do what is needed; eg: on destruction, write back to the
> database library.

ODMG proposes new/delete operators for persistent object creation/destruction.
I this case you can construct your object is somewhat bigger memory chunk,
which can contain some other implementation specific per-object information.
Ex:
     MapDatabase db("file.db");
     db.open();
     //create object
     odb_ptr<MyObject> pobj = new (db) MyObject;

     pobj->dosomth();

     //drop object from db
     pobj.destroy();
     //or
     delete pobj.get(); // pobj.get()

     db.close();

>
> This seems like a good idea, but it is fraught with complications.
> Consider two iterators i&j which (happen by chance to) point to the
> same object.
> i->set_a(j->set_b(4) + 2);
>
> Oops. You would expect both changes to work since they are
> presumably modifying different member variables. However, since
> i-> and j-> both read from disk and deserialised to a temporary,
> we are modifying two different temporaries. Therefore only one of
> the changes (whoever's object destroys last) will be made.

If your class is going to have concurrent transactions, than your
example shows how concurrency works.
if i & j belong to same transaction (or no transactions) than
assert(&(*i)==&(*j) ). It can be implemented if MapDatabase::iterator is
smart enough "smart pointer" :)

>
> I have a good feeling about this solution though as I think it
> conceivable that smart enough template code might be able to detect
> these cases.
>
> Solution #4: ask someone smarter!
>
> ... that's you. :-)

Don't count on it :) Actually you touched a top of iceberg.
ObjectDatabases are very painful things. Unfortunately
they are not too popular nowadays. The reason for
this is simple, they are extrimally difficult to implement
and use (at least for c++). So if you want to hear my
humble opinion: try to implement something small
and limited in use first.

BTW, the best open-source link that can be of some help :
http://www.garret.ru/~knizhnik/

regards,
bohdan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk