Boost logo

Boost :

From: Wesley W. Terpstra (terpstra_at_[hidden])
Date: 2002-11-19 14:48:15


On Tue, Nov 19, 2002 at 12:52:19PM -0500, David Abrahams wrote:
> "Wesley W. Terpstra" <terpstra_at_[hidden]> writes:
> > On Tue, Nov 19, 2002 at 10:38:27AM -0500, David Abrahams wrote:
> >> I haven't been paying attention, but IIUC what you're proposing, these
> >> things are no longer conforming iterators.
> >>
> >> The way to make random access iterators over disk storage is to build
> >> an iterator which stores its value_type internally. You can even
> >> arrange for it to construct the value_type in its internal storage on
> >> demand, so that it doesn't store anything until it is dereferenced.
> >
> > I assume you mean they are not iterators because operator -> is
> > broken?
>
> And operator*.

>From http://www.sgi.com/tech/stl/trivial.html:

        [1] The requirement for the return type of *x is specified as
        "convertible to T", rather than simply T, because it sometimes makes
        sense for an iterator to return some sort of proxy object instead of
        the object that the iterator conceptually points to. Proxy objects
        are implementation details rather than part of an interface (one use
        of them, for example, is to allow an iterator to behave differently
        depending on whether its value is being read or written), so the
        value type of an iterator that returns a proxy is still T.

Similar things can be found for vect[offset].

I am printing http://www.boost.org/libs/utility/iterator_adaptors.pdf
to take home with me this evening to see what is in there.

> > What you are proposing however is flawed for several reasons.
> >
> > If I stored the value_type internally, this will break:
> >
> > map::iterator i = ...;
> > map::reference x = *i;
> > ++i;
> > x = ...; // what is x now pointing at? the wrong record.
>
> That code is already broken if it makes any assumptions about what x
> refers to after ++i. Sad but true.

Really? Are you certain about this? If you could give me a quote I would
love to hear it. I know it is not going to work for Input iterators, but
what about a Forward Iterator?

What about
        map::pointer p = i->fn_returning_this();
        ++i;

is p now invalid? I know that in the STL containers it is generally still
ok. (map, set, list, etc) but that doesn't mean it is allowed. :-)

> > Also, if you have two iterators pointing at the same thing, but keeping
> > distinct value_types internally, expressions like:
> > i->set_member_a(j->set_member_b(3) + 2);
> > will break -- only one of the changes will make it to disk.
>
> You can get around this by dynamically allocating the value_type and
> keeping a cache of active values in the container... if it's
> important.

Errr... That is what I said after all in the part you just snipped.

And, aren't you are the one who said "So what?" to partial conformance?

If the above expression fails to work, it is far worse than not providing
operator ->. The missing operator is detected at compile time; this could
take a long time to track down.

Actually, even:
        i->set_bar(2);
        j->set_bar(4);
with i==j could write "2" to the disk with your internal value_type.
Clearly not what the user intended and very hard to detect reading it.

Therefore, you MUST have a common allocated value_type, which means you
are going to need some way to find them. The information you have at the
time you want to find them is:

        the unique memory location of the serialized item
        the unique sector+offset of the serialized item
        the serialized item
        a pointer to the session object, transaction, and database

I seek an efficient way to do this.
So far, the sector+offset map is the best I can think of.

> > The whole question revolves around:
> >
> > is the overhead of such a table justified by the benefit of
> > allowing member methods to be called on objects within the
> > container.
>
> It depends on whether you're advertising STL compatibility or not. If
> not, do what ever you like and use a large, loud disclaimer when you
> write "iterator" (in quotes) in your documentation. If so, you have to
> bite the bullet and make the iterators conform.

I really do want them to conform. Don't read otherwise into my writing.
However, in the practical situation this comes from, speed and consistency
of the data are probably more important than feature coverage.

> > There are significant costs:
> > the overhead of redundant cache
> > (it is already cached at the sector level)
> > the overhead of indexing the map
> > (considerable if you are just deserializing an int)
> >
> > My current answer is "not justified". But, I am open to persuasion,
> > especially in the form of an optimized solution.
>
> I think it's early to worry about optimization. Make it work first. An
> implementation which lies about its "iterators" is broken.

Man, that is a bit hard-line. If it fails to compile, that is far better
than obscure crashes (or worse---data corruption) later.

The whole reason I started this thread was to find a way to implement
i->foo_bar(); efficiently. I know it is important; that's why I asked.

But, I need a solution, not a "do it or else". :-)

My best idea is the map, but I know that this is too slow.

It makes my 40Mb test-case go from under a second to twelve seconds.
All CPU bound; not disk. That is bad.

I have fuzzy ideas about sneaking a pointer into the sector buffer in place
of the object (till serialize), but the alignment and size problems here
worry me.

Other things to consider with a cache of objects is that the wrapped
transaction will need to reserialize and destroy them all on commit, which
means that keeping the map in the database is not the right place; the
transaction has to find all of them.

Good night.

---
Wes

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk