Boost logo

Boost :

Subject: Re: [boost] [GSoC] Some new ideas for this year's GSoC
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-04-06 14:42:04


Stefan wrote:

> * Optimize Boost String Algorithm finders to use an efficient substring
> matching algorithm as opposed to the naive algorithm. Most likely
> because of the fact that boost has implemented the finders for strings
> as generic algorithms, they haven't used efficient algorithms such as
> KMP, Boyer-Moore, Rabin-Karp etc.
> \see boost/algorithm/string/detail/finder.hpp
> The idea is to implement efficient algorithms such as the ones
> mentioned above to replace the current naive implementation. Note: it
> may be difficult to find nth matching substring position using some of
> these algorithms.
> Difficulty: easy-medium
>
> * Implement a string algorithm for generating the suffix array. The
> suffix array, after being precomputed on a certain string, allows for
> very efficient substring searches in the future. This is useful when the
> string doesn't change too often but there are plenty of substring to
> search against.

Why are those two points separate? It seems to be the same thing:
provides alternative and potentially better string searching algorithms.
That seems fairly easy to get right and get included into boost.

> * Space partitioning data structures? kd-trees? quadtrees? octtrees?
> collision detectors? closest neighbors?

There were already some projects about that in the past years, but I
don't think they were very successful.
You might want to take a look at them.

Integrating with Boost.Geometry would definitely be good. (it already
provides collision detection and closest neighbour I believe)

> * In-place radix sort? Radix sort is a very efficient algorithm which
> performs better than std::sort (my implementation) (also asymptotically
> better) for some particular types such as: uint8_t, uint16_t, uint32_t,
> unsigned char, unsigned char[2], unsigned char[4] etc. Radix sorting
> takes linear time, but unfortunately, linear memory. It is very useful
> for sorting very large amounts of numbers tho (or genetic codes and
> maybe some other stuff).

I think there is a sorting library in the review queue that provides
that kind of thing.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk