|
Boost : |
Subject: Re: [boost] new library (space partitioning)
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2010-07-30 08:02:33
Simonson, Lucanus J wrote:
> Adam Wulkiewicz wrote:
>> You're right I should adopt their nD point interface and box probably.
>>
>> But, if boost::geometry::point you have in mind, coordinates are
>> chosen
>> in compile-time only(or maby I'm wrong?) and this can't be used by
>> space partitioning data structures.
>
> If they don't provide an operator[] or get(axis) interface and only get<>() then that is a problem and you probably should influence them to fix that so that before they release their library. It isn't too late.
>
To clarify this. It is possible to create some data structures (kd-tree
with sequential choosing of splitting plane, regular tree, etc.) but in
more complex kd-tree splitting algorithm there must be this kind of
interface used. I'd like to develop data structures with 'switchable'
internal structure/building algorithm. If the tree structure is better,
the searching is faster.
>>> Why not use a boost::optional or pass point_object by reference as
>>> first argument and return bool? Either is better by a long shot.
>>
>> The thing is, there aren't copies of point_objects in kd-tree. There
>> are pointers. If there were copies, the 2nd option will be the best.
>>
>> Btw, it's another issue I'm not sure about. Should copies or
>> references/pointers be kept in kd-tree?
>
> It depends to some degree on how many dimensions are stored in a point. If the point is not absurdly large then copies should always be better because you avoid pointer chasing and get better cache behavior. Just try sorting a vector of points and compare to sorting a vector of pointers to points and you'll quickly see that local copies wins.
>
>>> Are you dynamically allocating the point_object you return?
>>
>> No, this is a pointer to one of the point_object passed in the
>> constructor.
>
> One reason to do this sort of thing is if the point type disallowed copy construction and assignment, but I doubt you had that in mind. We lean pretty heavily on NRVO and inlining to make return by value fast.
>
>>> Why does point_object exist, why not just use point_type directly?
>>
>> Because there may be some complex point object which shouldn't been
>> point's interface implementation because it isn't a point. Fe. in
>> Photon Mapping algorithm photons are used. They are point objects but
>> aren't
>> points, they have some position but may have some other nD attributes
>> direction, color, etc.
>>
>> But, I'm open to suggestion.
>
> You can use adaptor traits similar to what boost geometry does to avoid the wrapper class.
>
With points it's ok, since there will be only one instance of a point in
the tree. User may still want to store them in different container and
use kdt just for searching purposes but he may just use some wrapper.
With big points (small photons will have around 28B) he may use a
wrapper too.
The problem is when you take volume objects into consideration. Volume
object may be 'present' in some number of kd-tree nodes. So you must
have pointers/references in there.
And the 'global' question is, what to treat kd-tree like? As a container
which purpose is to store points/volumes and provide fast searching or
maby as an additional searching accelerator.
If it should be the the 'main' container to store objects, there should
be some container internally in the volume kd-tree and the tree
structure should contain pointers/references/indexes. So I thought that
this container should be managed by the user and the tree should be just
a search accelerator.
The main danger I see is that if container is changed, user must
remember to create new kd-tree.
Another thing is that volume objects may be a lot bigger than points,
fe. it may be 3D meshes.
>> // find one closest point, max distance = 10
>> point_object p;
>> bool found = kdt.find(p, point_type(...), 10);
>>
>> // find max 5 closest points, initial max distance = 10
>> // output: number of pointsfound and distance to the furthest point
>> std::vector<point_object> fv;
>> fv.reserve(5);
>> std::pair<size_t, value_type> found_data =
>> kdt.find(std::back_inserter(fv), point_type(...), 10, 5);
>> // or
>> value_type max_distance;
>> size_t found_points = kdt.find(std::back_inserter(fv), max_distance,
>> point_type(...), 10, 5);
>
> I wouldn't bother returning the max distance found. Let the user derive it from the points returned if they want it. It complicates the interface and I'm not sure how often it is useful.
Yes it complicates but it would be nice to return it somehow since it's
calculated 'inside' allready and may be used in algorithms where the
area of found points is used. Maby just implement 2 versions of this method?
Btw, there might be fe. a search area functor. This will make possible
to specify the size and the shape of the search area.
>> I used rebuild(), not (rebalance()) because there are some splitting
>> rules producing nice, not balanced kd-trees (see:
>> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.8380).
>
> What happens if the user queries before after they have added but before they have built? Throw an exception? Return nothing? Automatically rebuild? I wouldn't provide build, rebuild or rebalance interfaces. It should be like the red-black tree, we never tell a std::set to rebalance itself. If the user wants to rebuild the tree they can construct a new one with all the points.
It makes it even simpler.
> What about if you provided lower_bound and upper_bound:
>
> kdtree::iterator itr = kdt.lower_bound(point, 0.0f);
> kdtree::iterator itr2 = kdt.upper_bound(point, 0.0f);
> std::vector<point> pts(itr, itr2);
> kdt.erase(itr, itr2);
>
> You may also sometimes want all the points in the tree and would just use begin() and end(). This style of interface has the advantage that it is maximally intuitive to people who are used to the stl. It may have the drawback of being challenging to implement and perhaps execute slower than back inserter style. There are iterator tricks used and facilitated by boost to make it easier and more efficient to provide such an interface as a wrapper around the back inserter interface, for example.
>
> Now, let's imagine that instead of euclidean the user wants L distance metric. You can provide a version of lower_bound and upper_bound that take a distance metric functor as a third argument.
Yes, I like it.
Regards,
Adam
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk