From: Wesley W. Terpstra (terpstra_at_[hidden])
Date: 2002-11-12 06:32:26
I am looking at making an stl-compatible wrapper around a key-value database.
It seems to me that such a wrapper would be widely useful since:
1. stdc++ algorithms could operate on the databases
2. switching a map<...> that grew too large to disk-backed becomes trivial
3. old reliable stl code could be reused on disk
4. a very gentle learning curve to existing C++ developers
5. it would be highly convenient to use
6. quite likely clever (ab)uses which I do not foresee would be possible
Obviously, any scheme like this would require serialisation of the key/data
pairs. My solution thus far has been to include a SerialTraits<T> concept
which provides a conversion method. Then the databases look like:
MapDatabase<KeyTraits, DataTraits> db;
where the KeyTraits include the typename of the Key, and the serialisation
methods. My comparison is always the lexical comparison on the serialised
Things have been going surprisingly well, but I have a problem that comes
from the serialisation: References and members.
The stl (rightly) assumes all the objects are in their usual representation
in RAM. Therefore, you can call const methods on them and if they are
mutable you can call non-const methods.
This is a disaster.
Although one might niavely claim mmap() could keep the representation of RAM
on the disk, I would disagree since this would still impose arcane
restrictions on the class member variables.
I have considered several solutions none of which I consider fully adequate.
Solution #1: don't do that!
To fix (*i) one could use that the specification merely says that *i
be convertible to T and assignable. Therefore I could return a proxy
object which serialised on assignment and deserialised on conversion.
Unfortunately, many legacy programs do (*i).fn() since i->fn() was
unreliable in compilers. This will not work with a proxy object.
Further, i->fn() is impossible.
Solution #2: cache it!
I have also considered deserialising an object once, allowing
modifications, etc. Then on commit reserialising.
This would work out ok, except that it introduces baggage:
I can't hold on to all the records that have been read since the
user might be touching more disk than RAM. Therefore, I would have
to do some sort of reference counting in my iterators.
This would unfortunately break any code which took a T& or T*
from an iterator and held on to it.
I am duplicating the read cache of the database in a wrapper.
(on the plus side I am also saving deserialisation work)
Solution #3: fuzzy template+inheritance tricks!
I figure there might be some clever way to return an object which
looks like the data object, but really is not. Maybe by inheriting a
template class from the contained class. Returning these might be
able to do what is needed; eg: on destruction, write back to the
This seems like a good idea, but it is fraught with complications.
Consider two iterators i&j which (happen by chance to) point to the
i->set_a(j->set_b(4) + 2);
Oops. You would expect both changes to work since they are
presumably modifying different member variables. However, since
i-> and j-> both read from disk and deserialised to a temporary,
we are modifying two different temporaries. Therefore only one of
the changes (whoever's object destroys last) will be made.
I have a good feeling about this solution though as I think it
conceivable that smart enough template code might be able to detect
Solution #4: ask someone smarter!
... that's you. :-)
Boost list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk