From: Jose (jmalv04_at_[hidden])
Date: 2005-10-17 18:53:57
Given the problems with the datasets, I would change the initial query to
one that clusters the cities as you do and show only the city + (lat,long)
and number of neighbours in the query results. Some cities will show with
the right string and others not given that the data doesn't map all
coordinates to the same city name.
RTL result is also 991 neighbours. The problem is that the Amsterdam area
has the 991 neighbours (and most likely all AS with the same identical
coordinates) so it is better to group the results by city i.e.:
Amsterdam (lata, lonb) 991
city B (latc,lond) xyz
city C (late, lonf) abc
With these results we can compare both queries and although the string names
from the cities my differ the numerical values should not.
On 10/18/05, Calum Grant <calum_at_[hidden]> wrote:
> > "Calum Grant" <calum_at_[hidden]> wrote
> > > > Is it also possible to see the result?
> > >
> > > Attached, Calum
> > I don't believe this is the required result, is it?
> The problem with this data is that it contains a lot of duplicates. If
> I cluster the cities into 5103 clusters, I get 47ms. On the other hand
> if I don't cluster them, then I get 4.1s. The expensive part is
> building the index of indexing on distances. The results are rather odd
> - the 500 locations I get have 991 neighbours.
> Regards, Calum
> Unsubscribe & other changes:
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk