Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [GSoC] Some new ideas for this year's GSoC
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2010-04-07 19:13:44

Next message: Simonson, Lucanus J: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Previous message: Daniel James: "Re: [boost] [Review Results] Boost.Polygon library accepted into boost"
In reply to: Stefan: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Next in thread: Simonson, Lucanus J: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Reply: Simonson, Lucanus J: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Reply: Thomas Klimpel: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"

Stefan wrote:
>>> * Optimize Boost String Algorithm finders to use an efficient
>>> substring matching algorithm as opposed to the naive algorithm. Most
>>> likely because of the fact that boost has implemented the finders for
>>> strings as generic algorithms, they haven't used efficient algorithms
>>> such as KMP, Boyer-Moore, Rabin-Karp etc.
>>> \see boost/algorithm/string/detail/finder.hpp
>>> The idea is to implement efficient algorithms such as the ones
>>> mentioned above to replace the current naive implementation. Note: it
>>> may be difficult to find nth matching substring position using some of
>>> these algorithms.

> I don't have an application proving it's inefficient yet, but a O(nm)
> algorithm would almost always perform worse than a O(n+m) one.

So n = size of input and m = size of search pattern.

If m is typically small, e.g. 2, then O(n+m) could easily beat O(nm) if
the constant factors are in its favour.

Even if m is large, then there will often be a dominant case where the
potential match can be rejected after looking at just the first element
(i.e. if I'm looking for xyz in abcdefgxyzlmnop, then a "worse case
O(nm)" algorithm will actually take O(n)).

I think that having a test data set in mind in advance would make this
much more concrete to reason about. Does anyone have any suggestions?

> I've just looked at the posted thread, but the algorithm is_any_of only
> looks for a character in a string, not for a substring in a string.

I guess my point was that even the simpler problem of finding a single
character has lots of potential for interesting work.

Regards, Phil.

Next message: Simonson, Lucanus J: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Previous message: Daniel James: "Re: [boost] [Review Results] Boost.Polygon library accepted into boost"
In reply to: Stefan: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Next in thread: Simonson, Lucanus J: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Reply: Simonson, Lucanus J: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"
Reply: Thomas Klimpel: "Re: [boost] [GSoC] Some new ideas for this year's GSoC"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk