Subject: Re: [boost] [Review:Algorithms] Comments about search algorithms
Date: 2011-09-28 16:45:44
> > And a question for Marshall: I have noticed that inside the search.hpp file you have a requirement for random-access iterators regarding the Knuth-Pratt-Morris searching algorithm as well. Is this for both the pattern and the corpus?
> Yes, and the here's the reason for that.
> There are basically two steps to the KMP algorithm (and frankly, the other two as well), and those two steps are coded in the do_search method.
> Given a potential match point (i.e position in the pattern and the corpus):
> 1) test and advance until you find a mismatch or come to the end (and report success)
> 2) Find the next potential match point.
> Step #1 can be done with ForwardIterators (or Bidirectional, in the case of Boyer-Moore), but in the second step, you have to be able to advance to arbitrary location in both the pattern and the corpus:
> // Figure out where to start searching again
> 544: match_start += idx - skip_ [ idx ];
> 545: idx = skip_ [ idx ] >= 0 ? skip_ [ idx ] : 0;
> In this case, idx is the position in the pattern.
> Hm.. if the skip table stored iterators instead of indexes, you could just use ForwardIterators.....
Using iterators instead of indexes is not a bad idea, but even with indexes, what is the reason for needing random-access iterators for the corpus? Typically in good-suffix shift we are shifting the pattern and not the corpus! I will try to study your code asap in order to become more specific.
I ensure you that requiring no more than forward iterators for both the pattern and the corpus is both feasible and important. And it is particularly important, since in the random-access iterators case BM will almost always be faster, hence the usefulness of a KMP implementation, which requires random-access iterators, is rather limited.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk