Subject: Re: [boost] new library (space partitioning)
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2010-07-29 22:31:04
Adam Wulkiewicz wrote:
> You're right I should adopt their nD point interface and box probably.
> But, if boost::geometry::point you have in mind, coordinates are
> in compile-time only(or maby I'm wrong?) and this can't be used by
> space partitioning data structures.
If they don't provide an operator or get(axis) interface and only get<>() then that is a problem and you probably should influence them to fix that so that before they release their library. It isn't too late.
>> By the way, the iterators over nD data structures problem is an age
>> old debate in C++ and there are people on this list who can provide
>> you with valuable insight on what to do and what not to do based on
>> fifteen years of experience with trying and sometimes failing to
>> define a good iterator based interface for nD containers. [...]
> I will appreciate any help.
You might want to see if you can dovetail into ongoing work on linear algebra and matrix library development in boost. We have ublas, multiarray and several libraries under construction in this area.
>> Your interface seems a little off to me, not what I was expecting.
> Yes, it's not fully designed and rather low-level. It is the
> implmented interface, not the target.
Ok, that makes more sense.
>> If you don't find a point within 100.0f does it return a null
> Yes. Btw, 100.0f is a square distance (it should be a distance).
That actually makes sense for an implementation interface since you want to avoid the sqrt.
>> Why not use a boost::optional or pass point_object by reference as
>> first argument and return bool? Either is better by a long shot.
> The thing is, there aren't copies of point_objects in kd-tree. There
> are pointers. If there were copies, the 2nd option will be the best.
> Btw, it's another issue I'm not sure about. Should copies or
> references/pointers be kept in kd-tree?
It depends to some degree on how many dimensions are stored in a point. If the point is not absurdly large then copies should always be better because you avoid pointer chasing and get better cache behavior. Just try sorting a vector of points and compare to sorting a vector of pointers to points and you'll quickly see that local copies wins.
>> Are you dynamically allocating the point_object you return?
> No, this is a pointer to one of the point_object passed in the
One reason to do this sort of thing is if the point type disallowed copy construction and assignment, but I doubt you had that in mind. We lean pretty heavily on NRVO and inlining to make return by value fast.
>> Why does point_object exist, why not just use point_type directly?
> Because there may be some complex point object which shouldn't been
> point's interface implementation because it isn't a point. Fe. in
> Photon Mapping algorithm photons are used. They are point objects but
> points, they have some position but may have some other nD attributes
> direction, color, etc.
> But, I'm open to suggestion.
You can use adaptor traits similar to what boost geometry does to avoid the wrapper class.
> So, I think the searching interface might look more or less like this:
> (I assume that copies are used)
> // find one closest point, max distance = 10
> point_object p;
> bool found = kdt.find(p, point_type(...), 10);
> // find max 5 closest points, initial max distance = 10
> // output: number of pointsfound and distance to the furthest point
> std::vector<point_object> fv;
> std::pair<size_t, value_type> found_data =
> kdt.find(std::back_inserter(fv), point_type(...), 10, 5);
> // or
> value_type max_distance;
> size_t found_points = kdt.find(std::back_inserter(fv), max_distance,
> point_type(...), 10, 5);
I wouldn't bother returning the max distance found. Let the user derive it from the points returned if they want it. It complicates the interface and I'm not sure how often it is useful.
> // find all points in area, initial max distance = 10
> std::vector<point_object> fv2;
> size_t found_num = kdt.find(std::back_inserter(fv2), point_type(...),
This one looks nice and clean.
> Eventually, rebuilding might be 'inside' add method.
> Rebuild might be implemented in build.
> There might be methods for removing points from given area of kd-tree.
> I used rebuild(), not (rebalance()) because there are some splitting
> rules producing nice, not balanced kd-trees (see:
What happens if the user queries before after they have added but before they have built? Throw an exception? Return nothing? Automatically rebuild? I wouldn't provide build, rebuild or rebalance interfaces. It should be like the red-black tree, we never tell a std::set to rebalance itself. If the user wants to rebuild the tree they can construct a new one with all the points.
What about if you provided lower_bound and upper_bound:
kdtree::iterator itr = kdt.lower_bound(point, 0.0f);
kdtree::iterator itr2 = kdt.upper_bound(point, 0.0f);
std::vector<point> pts(itr, itr2);
You may also sometimes want all the points in the tree and would just use begin() and end(). This style of interface has the advantage that it is maximally intuitive to people who are used to the stl. It may have the drawback of being challenging to implement and perhaps execute slower than back inserter style. There are iterator tricks used and facilitated by boost to make it easier and more efficient to provide such an interface as a wrapper around the back inserter interface, for example.
Now, let's imagine that instead of euclidean the user wants L distance metric. You can provide a version of lower_bound and upper_bound that take a distance metric functor as a third argument.
Lots to think about.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk