|
Boost : |
From: Calum Grant (calum_at_[hidden])
Date: 2005-10-18 14:52:19
> > The problem with this data is that it contains a lot of
> duplicates.
> > If I cluster the cities into 5103 clusters, I get 47ms. On
> the other
> > hand if I don't cluster them, then I get 4.1s. The
> expensive part is
> > building the index of indexing on distances. The results
> are rather
> > odd
> > - the 500 locations I get have 991 neighbours.
>
> This sounds like a correct result -- can I see the first 50 items?
The output is at the bottom.
I've managed to get my times down a bit by reimplementing the solution
in a much more straightforward way. I can insert all 21421 items,
calculate the number of neighbours, and display the top 500 in 235ms.
Then I can remove 1982 items, and redisplay the results, in 47
milliseconds. This of course makes me suspicious that I'm doing
something wrong.........
The solution I have is not terribly elegant from RML's point of view,
since as I said before RML is designed for logical queries rather than
numerical computation. It was not a terribly interesting problem
because it needed just a single table with two indexes.
Regards,
Calum
===============================================
17882 ALTEC-AS 991
16918 BILIM-AS 991
16917 IXEUROPE-FR-ASN 991
8734 WISH-NOKNOK 991
16915 DHMS-NET 991
17700 IOMART-AS 991
17494 RAMSATCOM 991
17670 GETIT 991
17493 AS-IKSYS 991
16913 UNSPECIFIED 991
16912 ASN-MPLS 991
17491 ABB 991
17490 UNSPECIFIED 991
17698 MEGAPROVIDER-AS 991
18013 CITCO-AS 991
16911 YACAST-AS 991
17803 KHODA 991
....
17278 AS-PETERSTAR 991
6399 Novaxess 991
17277 UNSPECIFIED 991
17868 RTK-Primorye 991
17276 UNSPECIFIED 991
17275 AS-SYNCHROLINE 991
11605 UNSPECIFIED 991
18167 RIPE 991
17274 AS-SUNET2000 991
11599 Swisscom-NA 991
11598 HPPOLAND-AS 991
17273 STARTVAS 991
There are 21421 locations
17219 ASN-CEDECRA 990
17587 UNSPECIFIED 990
17778 KIWWI-HU-AS 990
17777 Actimage 990
17433 EBS-Europe 990
...
17280 LiberCom-AS 990
18207 RIPE 990
18206 RIPE 990
17279 UNIFFM-NET 990
17493 AS-IKSYS 990
There are 19439 locations
Insert time = 234 milliseconds
Delete and redisplay = 47 milliseconds
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk