From: Tinko Bartels (tinkobartels_at_[hidden])
Date: 2020-06-16 23:14:46
I am happy to hear that the mask works.
With regard to hashing: As long as two linestrings share the exact same
points and differ in no way other than maybe having reverse order, I think
the idea of normalizing the order of points should work. I adapted your
sample code to put together an example in godbolt, you can observe the
output on the right side. shash is the sorting hash and it is the same for
both linestrings in this example: https://godbolt.org/z/QRbTnR .
The crucial point here is "exactly the same", because hashing the points
like this is in essence similar to comparing doubles or floats like "a ==
b" which one usually wants to avoid. Usually one checks something like
"std::abs(a - b) < 0.0001". Of course your hash only looks at individual
linestrings so I don't see how that could be done here. If two linestrings
are generated from different calculations they may differ, even if you
would expect the results to be the same if the computer could use real
numbers. Unfortunately there are subtle ways in which floating-point (i.e.
float, double or long double) arithmetics can mess with your results. For
example, it is well-known that a + b + c can be different from a + c + b
and but I have also seen a compiler optimization that can cause a * b + c
* d to be different from c * d + a * b.
There may be ways around this with floating point types (such as double,
float, long double) for your hashing functions but I can't think of any
right now, maybe somebody else has an idea.
Here are two ideas for alternatives to hashing geometries with
1) Consider the use of integer coordinates (i.e. int, long or long long).
That is the solution that was chosen in some projects that I participated
in. Of course, this is only an option if you can work with the limited
range of integers and if integers are suitable for all calculations that
you do in your application and there is no other source of roundoff error.
2) Consider the use of a map/set instead of an unordered_map/unordered_set.
Of course that would change the time complexity of some operations from
O(1) to O(log(n)) but it would allow you to consider two linestrings at the
same time and you could use some tolerance threshold, such as one usually
does with floating point types. Here is a quickly put together illustration
(with std::set) of what I mean by that: https://godbolt.org/z/DCkXdY .
There are probably pathological cases that can break this as well, but
maybe this works with your intended application.
I don't know how one sets tolerances with the rescaling in Boost.Geometry.
But I have seen examples in which it was helpful to just disable it by
defining BOOST_GEOMETRY_NO_ROBUSTNESS, so you could try that and see
whether that helps with your application.
Geometry list run by mateusz at loskot.net