Boost logo

Boost :

From: Bohdan (warever_at_[hidden])
Date: 2002-11-19 14:00:21


"Wesley W. Terpstra" <terpstra_at_[hidden]> wrote in message
news:20021119173421.GE589_at_ito.tu-darmstadt.de...
> On Tue, Nov 19, 2002 at 10:38:27AM -0500, David Abrahams wrote:
> > "Wesley W. Terpstra" <terpstra_at_[hidden]> writes:
> > > So... I am beginning to lean towards the "don't do that" approach where I
> > > simply don't allow the user to call member methods on items in the
> > > container. (And not let them take pointers) This allows at least the above
> > > optimization and a few others (like *i = *j; -- no deserialize&serialize)
> > > and probably more I don't forsee yet.
> >
> > I haven't been paying attention, but IIUC what you're proposing, these
> > things are no longer conforming iterators.
> >
> > The way to make random access iterators over disk storage is to build
> > an iterator which stores its value_type internally.

IMHO "stores pointer to value from cache internally" would be better.

>> You can even
> > arrange for it to construct the value_type in its internal storage on
> > demand, so that it doesn't store anything until it is dereferenced.
>
> I assume you mean they are not iterators because operator -> is broken?
> Yes I agree.
> Aside from that however, I believe they do conform to iterators.
>
> What you are proposing however is flawed for several reasons.
>
> If I stored the value_type internally, this will break:
>
> map::iterator i = ...;
> map::reference x = *i;
> ++i;
> x = ...; // what is x now pointing at? the wrong record.

With above just next record.

>
> Also, if you have two iterators pointing at the same thing, but keeping
> distinct value_types internally, expressions like:
> i->set_member_a(j->set_member_b(3) + 2);
> will break -- only one of the changes will make it to disk.
>
> ---
>
> I know that this could be solved with some sort of:
>
> struct Address
> {
> sectorptr_t sector;
> sectorlen_t record;
> };
>
> struct Object
> {
> Observable observable;
> T object;
> };
>
> std::map<Address, Object> which I keep in for each database.

Isn't it cache ? Looks familiar :)

> Then, every time you want to dereference an iterator, you lookup the address
> in the table (deserializing if necessary), reference the observable and
> return the object.
>
> When the observable is not_observed, you remove the Object from the table
> and reserialize to disk.
>
> The whole question revolves around:
> is the overhead of such a table justified by the benefit of allowing
> member methods to be called on objects within the container.

No overhead. Rather you have overhead with constant deserializing.
Just make a list of all possible operations on object and you will
understand (i hope:) ) that cache class not only accelerates your
serialization/deserialization but also solves problem of
"pointer<->object on disk identity". Cache of pure data buffers
is much simpler but not need when you have object cache.

>
> There are significant costs:
> the overhead of redundant cache
> (it is already cached at the sector level)
> the overhead of indexing the map
> (considerable if you are just deserializing an int)

In my practice storing int is 1% vs complex object storing
is 99%. Constant serializing/deserializing is poor idea for
big object. But i see ... you are fighting for simplicity.

>
> My current answer is "not justified". But, I am open to persuasion,
> especially in the form of an optimized solution.

Ok lets order all problems:

Ex:

//legacy code {

class MyClass
{
    string name;
};

void do_some_changes( MyClass & value )
{
     value.name = "...";
     ...
}

//legacy code }

How are you going to : load object, do_some_changes on it and save ?
Most probably:

MyClass x = db[ 444 ] ; //copy #1
    do_some_changes( x );
    db[ 444 ] = x; //copy #2

Problem #1:
User may want just:
    do_some_changes( db[444] ); //note! it can even compile on some comiplers.

Which is wrong untill you put serialization in destructor, but in this case you
have
frequent object serialization, which is slow and serialization/deserialization
can't
be synchronized:
        construct instance1 -> deserialize1
        change instance1.
        construct instance2 -> deserialize2
        change instance2.
        destruct 2
        destruct 1.
 Changes from instance2 goes away.

Problem #2:
Why you think you can copy any object ?

Problem #3:
Even if you can. Why do you think that object copy is cheap ?

Problem #4:
User has some template algorithm which deals with generic stl container.
Algorithm expects object "pointer <-> disk buffer" identity when performing
const operations on your container which is norm for std::container.
I'm not sure if it is required by standart, but it seems logical to me.
-------------------
MHO :
It would be better to implement buffer (POD?) disk container and object
disk container separately. I looks like they are pretty different things.

regards,
bohdan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk