Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2005-01-12 12:46:14


John Maddock wrote:
>> And, of course, there are additional speed reserves in KMP for search
>> problems.
>
>
> Actually, I think I may remove that code as a dis-optimisation, a simple
> scan for first character is probably faster in the *average* case, and
> of course if you can do a Boyer-search that's even better.
>
> Like you, more research is needed!
>

I have experimented a bit in this area, and you're right. I've had best
results with a three-pronged approach to searches:

(1) Boyer-Moore is a *huge* win if you can do it. xpressive and GRETA
both use BM to great effect when the pattern begins with a string literal.

(2) As a fall-back, use the first character if you can. This handles
patterns like (a...|b.|c..), where I will scan forward for one of {abc}.

(3) Fall back on brute force search at every position.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk