Boost logo

Boost :

From: Arkadiy Vertleyb (vertleyb_at_[hidden])
Date: 2005-10-21 15:08:10


"Calum Grant" <calum_at_[hidden]> wrote

> > Calum, is it possible to see the latest code of your example?
>
> Here it is.

I see...

A couple of questions:

1) You modify counter all the time -- does RML allow to do this with the
fields on which the table is sorted?

2) When you say :

select(m_locations,
    m_locations.x>=loc.x-m_cutoff &&
    m_locations.x<=loc.x+m_cutoff &&
    m_locations.y>=loc.y-m_cutoff &&
    m_locations.y<=loc.y+m_cutoff &&
    m_locations.z>=loc.z-m_cutoff &&
    m_locations.z<=loc.z+m_cutoff)

Do you use any kind of special index for this?

In any case this seems to prove that RML does a good job when searching on
multiple indexes -- you do quite a bit of that. Also, inserts/updates are
probably very efficient.

But, as I said before, the overall efficiency of this example is based on
the fact that this is a very smart, custom, hand-coded solution (see how you
maintain the counter). No library would be able to beat its performance
using a higher-level abstraction (If RTL had a count() operator in addition
to generic groupby(), we could be much smarter in maintaining it during
updates).

OTOH, RTL doesn't seem to have much visible overhead. What's happenning
during the request is that for every record a range query is executed on the
index sorted by lattitude to cut off most of the records (about 90%). This
range query is then walked and the proper selection condition (based on
distance) is applied, and the counter is calculated. This all gets indexed
on the (counter, city, name). Everything except indexing is done lazily.

The major performance overhead is introduced by indexing of the selection
(about 2,000,000 records), but this also provides a major gain with updates,
since selection is (at least at the moment) the most expensive part of this.
Still, I can't do as good as you can, since I don't have a notion of
"counter", so groupby has to be run over and over again (even if on a faster
argument).

I should be able to find some major inefficiencies in selection, and improve
the performance. I don't think I'll be able to beat your result though (by
the reasons I described above).

But again, I don't believe this all proves RML is faster than RTL, since the
coding approaches were so different. I don't even know if it's possible to
compare the libraries at this point, since they have their strong points in
different areas. RTL's strong point in the ability to build and run
SQL-like views. This I tried to show in my example. RML seems to be very
efficient at maintaining and using multiple indexes, but, despite its more
SQL-like interface, can mimique only very simple SQL queries.

Regards,
Arkadiy


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk