Subject: Re: [boost] [Review:Algorithms] Comments about search algorithms
Date: 2011-09-28 13:05:30
Oh, and something important which I forgot to mention: Bad-character shifting requires the presence of random-access iterators in order to work efficiently. Consequently, since Boyer-Moore and Boyer-Moore-Horspool are using bad-character shifting, they are only efficient for random-access iterators. Hence KMP will be still useful, even if it is much slower in the most common cases.
And a question for Marshall: I have noticed that inside the search.hpp file you have a requirement for random-access iterators regarding the Knuth-Pratt-Morris searching algorithm as well. Is this for both the pattern and the corpus?
Phil Endecott wrote:
> How efficiently could the current algorithms be used with DNA data?
> Does anyone have any experience with a packed base-4 representation?
May I propose an easy way: Why not creating random sequences, using only four char values (ie "abcd") This will easily give you a very good idea about the KMP performance. For a more elaborate aproach, you can always use the tests of Musser & Nishanov: http://www.cs.rpi.edu/~musser/gp/gensearch/index.html
> Phil Endecott wrote:
> > However, in my opinion the most interesting and useful property of
> > the KMP algorithm is not its performance, but its ability to help in
> > building very generic sequence searching algorithms. In fact KMP has
> > helped me to build an efficient generic sequence searching algorithm,
> > which actually works with ranges accessed by single-pass (input)
> > iterators. (See http://www.codeproject.com/KB/stl/single_pass_search.aspx
> > and also in boost-vault/Algorithms) Needless to say that I would be
> > more than happy to submit this work to boost, whenever there is an
> > interest.
> I am reminded that Spirit has some sort of adaptor that allows an input
> iterator to be used with a grammar that requires backtracking by
> storing recent inputs in a buffer. Presumably something like this
> could be used here. How much does your algorithm gain over a
> combination of such a buffer and one of Marshall's algorithms?
Very good idea! I will try to compare my algorithm against this adaptor. A very interesting benchmark for me thank you Phil.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk