Subject: Re: [boost] [GSoC] Some new ideas for this year's GSoC
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-04-06 14:42:04
> * Optimize Boost String Algorithm finders to use an efficient substring
> matching algorithm as opposed to the naive algorithm. Most likely
> because of the fact that boost has implemented the finders for strings
> as generic algorithms, they haven't used efficient algorithms such as
> KMP, Boyer-Moore, Rabin-Karp etc.
> \see boost/algorithm/string/detail/finder.hpp
> The idea is to implement efficient algorithms such as the ones
> mentioned above to replace the current naive implementation. Note: it
> may be difficult to find nth matching substring position using some of
> these algorithms.
> Difficulty: easy-medium
> * Implement a string algorithm for generating the suffix array. The
> suffix array, after being precomputed on a certain string, allows for
> very efficient substring searches in the future. This is useful when the
> string doesn't change too often but there are plenty of substring to
> search against.
Why are those two points separate? It seems to be the same thing:
provides alternative and potentially better string searching algorithms.
That seems fairly easy to get right and get included into boost.
> * Space partitioning data structures? kd-trees? quadtrees? octtrees?
> collision detectors? closest neighbors?
There were already some projects about that in the past years, but I
don't think they were very successful.
You might want to take a look at them.
Integrating with Boost.Geometry would definitely be good. (it already
provides collision detection and closest neighbour I believe)
> * In-place radix sort? Radix sort is a very efficient algorithm which
> performs better than std::sort (my implementation) (also asymptotically
> better) for some particular types such as: uint8_t, uint16_t, uint32_t,
> unsigned char, unsigned char, unsigned char etc. Radix sorting
> takes linear time, but unfortunately, linear memory. It is very useful
> for sorting very large amounts of numbers tho (or genetic codes and
> maybe some other stuff).
I think there is a sorting library in the review queue that provides
that kind of thing.
Boost list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk