Boost logo

Boost :

Subject: Re: [boost] [library submission] smoothsort
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2008-12-14 15:48:31


Edouard wrote:
>I just finished a C++ implementation of the smoothsort algorithm (a
>derivative of the heapsort algorithm by Dijkstra that, in his words, "is of
>order N log N in the worst case, but of order N in the best case, with a
>smooth transition between the two").

Wouldn't something like the following be more practical?

(note: incomplete and untested)

template <typename iT>
void optimistic_sort(iT b, iT e) {
  iT low_end_of_unsorted_range = find_first_out_of_order_element(b, e);
  iT high_end_of_unsorted_range =
        find_last_out_of_order_element(low_end_of_unsorted_range, e);
  iT low_sorting_range = binary_search(b, low_end_of_unsorted_range,
                min_element(low_end_of_unsorted_range,
                                high_end_of_unsorted_range));
  iT high_sorting_range = binary_search(b, low_end_of_unsorted_range,
                max_element(low_end_of_unsorted_range,
                                high_end_of_unsorted_range));
  std::sort(low_sorting_range, high_sorting_range);
}

By implementing a function that finds the high end of the unsorted range as well as the min and max element in that range in a single pass you would have a more optimal implementation of optimistic sort than what I sketched out here. In the best case optimistic sort would be linear and generally less than 2X std::sort runtime in the worst case because its overhead is linear. It would provide a smooth transition from linear to O(n log n) as the size of the unsorted range increases. Optimistically, people who have a mostly sorted range have a small unsorted-subrange.

Everyone wants a better sorting algorithm. Everyone wants a better binary tree.

One thing I think would be useful is a bucketed binary tree that is more efficient with memory consumption and keeps its elements in contiguous order in the buckets. The tree would rebalance the elements to keep its bucket utilization high, but insertion and removal into the tree would invalidate its iterators. You get better cache performance and lookup/iteration speed than the std tree at the cost of iterators being invalidated by modifying the data structure.

This is a pretty simple data structure to implement using a class encapsulated array as the elements of a std::set. Alternately, you can use a linked list of class encapsulated arrays with iterators into the list stored in a std::set along with the lower bound of each bucket to avoid loading buckets into cache if they aren't the one you are looking for.

I may end up submitting a bucketed tree data structure along with my geometry algorithms if it turns out that it helps their performance meaningfully. Because it is a general data structure, it might not belong in the guts of a geometry library, and ought to be factored out.

Regards,
Luke


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk