Boost logo

Boost :

From: Arkadiy Vertleyb (vertleyb_at_[hidden])
Date: 2005-10-07 15:32:29


"Calum Grant" <calum_at_[hidden]> wrote

> Arkadiy Vertleyb wrote
> > Here is a simple example that illustrates this. The attached
> > program does the following:
> >
> > 1) creates table of one column;
> > 2) self-joins the table
> > 3) selects records where columns are equal.
> >
> > This by itself doesn't make any sence, but it closely
> > approximates the task of finding close cities, stars,
> > billiard balls, monsters and soldiers, etc.
>
> Here is the equivalent in RML. Forget 3000 rows, what about adding
> 1000000 rows, then adding another 1000000 to it? What time does the
> equivalent RTL take?

Calum, I think you are missing my point. The above example was provided to
illustrate my two statements:

1) RTL can index any (arbitrary complicated) expression, not just table;
2) RTL can _incrementally_ update any such index when the underlying table
has been changed.

Did I use the most efficient expression for this particular task? Of course
not. I used cross-product where I would use a range (or merge) join
otherwise. I did it _intentionally_, because I wanted to provide an analogy
of a complicated condition, such condition for which optimization is simply
not possible.

I am pretty sure your library figured out how to do this query efficiently,
without considering every possible combination (it didn't consider 10^12
combinations, did it?). However, what it would do if the condition was more
complicated? Like calculating the distance, matching people by certain
characteristics, etc.? RTL can build an index on even slow expression,
after which subsequent queries become a few orders of magnitude faster. And
the indexes are not broken when the tables are changed.

Having said all this, RTL doesn't enforce this kind of usage. If the query
needs to be executed only once, or if one doesn't want to consum memory for
indexes, any appropriate relational expression can be built, with any number
of indexes in different nodes, to achieve needed tradeof between speed,
initial response, memory consumption, etc. My firm belief is that this kind
of flexibility is simply impossible with SQL-like approach.

> I don't think self-joins come up that often in real life however.

I think they are. I just gave examples where they could be (and are) used.
This, however, is not about a self-join. This is about any query which is
slightly more complicated than just a simple index-based selection.

Once again, I don't believe this is a right time for any performance
contests. They will not prove anything. I think RML will be faster in some
cases, and RTL -- in the others. And a couple of days spent on optimization
can dramatically change the picture.

What I would like to be discussing instead is the _design principles_ behind
the libraries.

Regards,
Arkadiy


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk