Boost logo

Boost :

From: Arkadiy Vertleyb (vertleyb_at_[hidden])
Date: 2005-10-07 16:48:50


"Calum Grant" <calum_at_[hidden]> wrote

> Arkadiy Vertleyb wrote:
> > > > OTOH, writing a primitive optimizer, would, in our
> > opinion do more
> > > > harm than good.
> > >
> > > By "harm" do you mean have worse performance?
> >
> > Yes, and also (possibly) inability to handle more complicated cases.
>
> Well I certainly agree that a library that couldn't handle complicated
> cases, or whose performance was poor, would need a rethink.
>
> But I don't think RML is in that category. It has excellent performance
> and can handle very complex queries. I should exercise caution however
> - I await to see how well your library copes with 1000000 items, does a
> self-join, adds another 1000000 items and does a second self join. You
> sound fairly confident that you will beat me!

I am not confident about this. We didn't spent a lot of time on
optimization.

But if I wanted to compete with you on this particular example, I would not
use the technique I showed -- I wouldn't use transactions, indexing, etc. --
I would just use a merge-join, and stopped after ten results, like you did,
and it would take no time. I might beat you or not, but RTL is flexible
enough to allow me to build an expression that would evaluate this (or any
other) expression efficiently, with the minimal overhead. And since I do it
by hand, I don't have to worry about my optimizer not being able to cope
with this.

> If you like you can come up with an extremely complex query - and we can
> compare again.

For starters, try to change the expression to something like abs(c1 - c2) ==
1, and output (or just walk) _all_ the results. Then repeat this 1000
times.
You can go back from 1000000 to 3000 :-)
>
> > > If you're going to make claims about the performance of an
> > approach,
> > > then you need to have benchmarks. You can't argue about air. For
> > > example the RML benchmarks
> > >
> > > http://visula.org/relational/benchmarks.html
> > >
> > > show RML to be matching or outperforming std::map.
> > >
> > > Would you care to benchmark your RTL versus RML or the STL before
> > > making such claims? I think that would settle which
> > approach did more
> > > "harm".
> >
> > Note that I didn't say RML's approach is harmful. All I said
> > is that a simplistic approach to implementing optimizer is
> > IMO inacceptable. Are you saying your optimizer is simplistic? :-)
>
> Of course not!
>
> > I believe that talks about performance are premature. I
> > think it's time to talk about what can and what can't be done
> > with this libraries. Then comes the optimization.
>
> I don't think talking about performance is premature. If we are going
> to provide a practical library, then performance matters. If a user has
> a choice between using RTL in a real application, but it is 10 times
> slower than Boost.MultiIndex, then I expect most users would stick with
> Boost.MultiIndex. People need to know.

Of course.

However I don't believe RTL even targets this niche. Relational approach is
much more than just indexing on multiple fields.

> On the other hand, I believe that RML is a much easier and more
> efficient (than std:: containers) way to manage collections. But I am
> biassed :-)
>
> > As for benchmarks, for such simple case as yours, you don't
> > need RTL -- you can benchmark sorted std::vector or
> > Boost.Multi_index. The performance of RTL will not be
> > different, since it's based on these containers. To
> > outperform STL is definitely not one of RTL's tasks.
> >
> > For more complicated cases though, such as ones that I
> > described in my original post, RTL does provide an efficient,
> > index-based solution, while other libraries, AFAIK, don't.
>
> In that case, could you give an example where RTL would be efficient,
> while "other libraries" would be inefficient. I have a case for you
> that might not be so efficient in RTL: What about indexing on multiple
> columns? This is what RML is designed for, otherwise you may as well
> just use std::map.

I don't know if we are much less efficient with multiple columns. Again,
the matter of fine tuning. And again, relational approach is much more than
just indexing on multiple columns. For just indexing on multiple columns
one can use Boost.MultiIndex.

IMO, relational approach is about providing multiple views into the same
data model. And making these views most efficient. The constrained
language of relational algebra allows one to achieve this. This is what RTL
is about.

> Although it sounds a little competitive, I think it's pretty instructive
> to be able to compare different implementations.

Nothing wrong about being competitive :)

Regards,
Arkadiy


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk