Boost logo

Boost :

Subject: Re: [boost] Multiple-pattern string search algorithms
From: meghana madhyastha (meghana.madhyastha_at_[hidden])
Date: 2015-05-12 00:41:27


Hi,

The Rabin Karp algorithm performs better than KMP and Boyer Moore during
multiple pattern searching. The interface that I have suggested is given
below. Please let me know your opinions and if there is a better approach
than the one I am suggesting. Any feedback/criticism/guidance will be
appreciated and helpful.

An overview of the Rabin Karp Algorithm:
The Rabin Karp algorithm assigns hash values to the text. The pattern is
made to 'slide' along the text to be searched. The hash value of the text
is computed and is compared to the hash value of the pattern. A brute force
search of each element in the pattern with the current window of text is
performed if and only if the hash value of the text matches the hash value
of the pattern (and this hash value can be calculated in O(1) time).

There can be two interfaces that can be implemented for this a procedural
and an object oriented interface (similar to the interfaces for the Boyer
Moore Algorithm). There should be two iterator types. One to iterate
through the text called 'textIter' and another to iterate through the set
of patterns (multiple patterns can be stored as part of a set) called
'setIter'.

In the object oriented interface, the hash value of the pattern can be
precomputed in the constructor. The text can be searched to match the
pattern using the () operator (similar to the implementation of Boyer
Moore). In the procedural interface, the search as well as hash computation
of the pattern string all happens in the single function
'rabin_karp_search'.

template <typename patIter>
class rabin_karp {
public:
     rabin_karp ( patIter first, patIter last );
    ~rabin_karp ();

    template <typename textIter>
    template <typename setIter>
    textIter operator () ( textIter text_first, textIter text_last, setIter
set_first, setIter set_last);
    };

template <typename patIter, typename textIter>
textIter rabin_karp_search (
        textIter text_first, textIter text_last,
        setIter set_first, setIter set_last );

Thanks and Regards,
Meghana


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk