Boost :

Date view	Thread view	Subject view	Author view

From: David Abrahams (dave_at_[hidden])
Date: 2002-11-19 17:01:26

Next message: Arkadiy Vertleyb: "[boost] Re: Serialization to relational table"
Previous message: Paul Mensonides: "Re: [boost] 'function' idea"
In reply to: Wesley W. Terpstra: "Re: [boost] Re: Re: STL applied to disk"
Next in thread: Bohdan: "[boost] Re: Re: Re: STL applied to disk"

"Wesley W. Terpstra" <terpstra_at_[hidden]> writes:

> On Tue, Nov 19, 2002 at 12:52:19PM -0500, David Abrahams wrote:
>> "Wesley W. Terpstra" <terpstra_at_[hidden]> writes:
>> > On Tue, Nov 19, 2002 at 10:38:27AM -0500, David Abrahams wrote:
>> >> I haven't been paying attention, but IIUC what you're proposing, these
>> >> things are no longer conforming iterators.
>> >>
>> >> The way to make random access iterators over disk storage is to build
>> >> an iterator which stores its value_type internally. You can even
>> >> arrange for it to construct the value_type in its internal storage on
>> >> demand, so that it doesn't store anything until it is dereferenced.
>> >
>> > I assume you mean they are not iterators because operator -> is
>> > broken?
>>
>> And operator*.
>
>>From http://www.sgi.com/tech/stl/trivial.html:
>
> [1] The requirement for the return type of *x is specified as
> "convertible to T", rather than simply T, because it sometimes
> makes sense for an iterator to return some sort of proxy
> object instead of the object that the iterator conceptually
> points to. Proxy objects are implementation details rather
> than part of an interface (one use of them, for example, is to
> allow an iterator to behave differently depending on whether
> its value is being read or written), so the value type of an
> iterator that returns a proxy is still T.

Yup. Proxies are allowed for input iterators, but not for forward,
bidirectional, or random access iterators.

> Similar things can be found for vect[offset].

Not unless you count vector<bool> (which nobody does).

> I am printing
> http://www.boost.org/libs/utility/iterator_adaptors.pdf to take home
> with me this evening to see what is in there.
>
>> > What you are proposing however is flawed for several reasons.
>> >
>> > If I stored the value_type internally, this will break:
>> >
>> > map::iterator i = ...;
>> > map::reference x = *i;
>> > ++i;
>> > x = ...; // what is x now pointing at? the wrong record.
>>
>> That code is already broken if it makes any assumptions about what
>> x refers to after ++i. Sad but true.
>
> Really? Are you certain about this?

It depends what type map is. If map is std::map<T>, then the code
isn't broken. However, if all you know is that map::iterator is an
iterator, then yes, it's broken.

> If you could give me a quote I would love to hear it. I know it is
> not going to work for Input iterators, but what about a Forward
> Iterator?

No, input iterator is the one the proxy works for.

The nearest quote is
http://anubis.dkuug.dk/jtc1/sc22/wg21/docs/lwg-active.html/#299, which
mentions the iterator with the internal cached value as a valid
random-access iterator.

> What about
> map::pointer p = i->fn_returning_this();
> ++i;
>
> is p now invalid?

If i is just some forward iterator, yes.

> I know that in the STL containers it is generally still ok. (map,
> set, list, etc) but that doesn't mean it is allowed. :-)

You might want to check to see if there's any standard text which
implies it's allowed for the standard containers. I think if it's not
explicitly allowed, you should submit a DR.

>> > Also, if you have two iterators pointing at the same thing, but keeping
>> > distinct value_types internally, expressions like:
>> > i->set_member_a(j->set_member_b(3) + 2);
>> > will break -- only one of the changes will make it to disk.
>>
>> You can get around this by dynamically allocating the value_type and
>> keeping a cache of active values in the container... if it's
>> important.
>
> Errr... That is what I said after all in the part you just snipped.

I realized that when I got there.

> And, aren't you are the one who said "So what?" to partial conformance?

Yes, but the problem you mention above is not a conformance
problem. The issue lies in your definition of "pointing at the same
thing". There's no guarantee that p == q implies &*p == &*q.

> If the above expression fails to work, it is far worse than not providing
> operator ->. The missing operator is detected at compile time; this could
> take a long time to track down.

It's only a bug user code relies on it.

> Actually, even:
> i->set_bar(2);
> j->set_bar(4);
> with i==j could write "2" to the disk with your internal value_type.
> Clearly not what the user intended and very hard to detect reading it.

QOI issue, I think. I'm not saying it's unimportant. It's just not a
conformance issue.

> Therefore, you MUST have a common allocated value_type, which means
> you are going to need some way to find them. The information you
> have at the time you want to find them is:
>
> the unique memory location of the serialized item
> the unique sector+offset of the serialized item
> the serialized item
> a pointer to the session object, transaction, and database
>
> I seek an efficient way to do this.
> So far, the sector+offset map is the best I can think of.

OK.

>> > The whole question revolves around:
>> >
>> > is the overhead of such a table justified by the benefit of
>> > allowing member methods to be called on objects within the
>> > container.
>>
>> It depends on whether you're advertising STL compatibility or
>> not. If not, do what ever you like and use a large, loud disclaimer
>> when you write "iterator" (in quotes) in your documentation. If so,
>> you have to bite the bullet and make the iterators conform.
>
> I really do want them to conform. Don't read otherwise into my
> writing. However, in the practical situation this comes from, speed
> and consistency of the data are probably more important than feature
> coverage.

OK.

>> > There are significant costs:
>> > the overhead of redundant cache
>> > (it is already cached at the sector level)
>> > the overhead of indexing the map
>> > (considerable if you are just deserializing an int)

You could keep blocks of objects in your cache.

>> > My current answer is "not justified". But, I am open to
>> > persuasion, especially in the form of an optimized solution.
>>
>> I think it's early to worry about optimization. Make it work
>> first. An implementation which lies about its "iterators" is
>> broken.
>
> Man, that is a bit hard-line. If it fails to compile, that is far
> better than obscure crashes (or worse---data corruption) later.

Some uses of proxies with templates can cause silent runtime
problems. Just think about what happens when you try to do:

std::swap(*p, *q)

Ask Howard Hinnant about this; he made his standard library work with
proxy iterators. The STL (and user) algorithm code has to be carefully
implemented to account for proxies... and even then there are still
places where you can get in trouble. The moral is that if you want to
use proxies you can give your iterator random-access capability, but
you can only call it an input iterator, or you will break someone's
code.

> The whole reason I started this thread was to find a way to implement
> i->foo_bar(); efficiently. I know it is important; that's why I asked.
>
> But, I need a solution, not a "do it or else". :-)

I don't know what to tell ya. The standard says what it says.

> My best idea is the map, but I know that this is too slow.
>
> It makes my 40Mb test-case go from under a second to twelve seconds.
> All CPU bound; not disk. That is bad.

Maybe you can use a better implementation.

> I have fuzzy ideas about sneaking a pointer into the sector buffer
> in place of the object (till serialize), but the alignment and size
> problems here worry me.

Definitely sounds fuzzy ;-)

> Other things to consider with a cache of objects is that the wrapped
> transaction will need to reserialize and destroy them all on commit,
> which means that keeping the map in the database is not the right
> place; the transaction has to find all of them.
>
> Good night.

Good luck! It's not an easy problem.

-- 
                       David Abrahams
   dave_at_[hidden] * http://www.boost-consulting.com
Boost support, enhancements, training, and commercial distribution

Next message: Arkadiy Vertleyb: "[boost] Re: Serialization to relational table"
Previous message: Paul Mensonides: "Re: [boost] 'function' idea"
In reply to: Wesley W. Terpstra: "Re: [boost] Re: Re: STL applied to disk"
Next in thread: Bohdan: "[boost] Re: Re: Re: STL applied to disk"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk