Boost logo

Boost :

Subject: Re: [boost] [Review:Algorithms] Comments about search algorithms
From: Marshall Clow (mclow.lists_at_[hidden])
Date: 2011-09-28 13:23:06

On Sep 28, 2011, at 10:05 AM, dpxguard-boost_at_[hidden] wrote:

> Oh, and something important which I forgot to mention: Bad-character shifting requires the presence of random-access iterators in order to work efficiently. Consequently, since Boyer-Moore and Boyer-Moore-Horspool are using bad-character shifting, they are only efficient for random-access iterators. Hence KMP will be still useful, even if it is much slower in the most common cases.
> And a question for Marshall: I have noticed that inside the search.hpp file you have a requirement for random-access iterators regarding the Knuth-Pratt-Morris searching algorithm as well. Is this for both the pattern and the corpus?

Yes, and the here's the reason for that.

There are basically two steps to the KMP algorithm (and frankly, the other two as well), and those two steps are coded in the do_search method.

Given a potential match point (i.e position in the pattern and the corpus):
1) test and advance until you find a mismatch or come to the end (and report success)
2) Find the next potential match point.

Step #1 can be done with ForwardIterators (or Bidirectional, in the case of Boyer-Moore), but in the second step, you have to be able to advance to arbitrary location in both the pattern and the corpus:

           // Figure out where to start searching again
544: match_start += idx - skip_ [ idx ];
545: idx = skip_ [ idx ] >= 0 ? skip_ [ idx ] : 0;

In this case, idx is the position in the pattern.

Hm.. if the skip table stored iterators instead of indexes, you could just use ForwardIterators.....

-- Marshall

Marshall Clow Idio Software <mailto:mclow.lists_at_[hidden]>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
        -- Yu Suzuki

Boost list run by bdawes at, gregod at, cpdaniel at, john at