Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2007-04-12 00:49:38


Jeff Garland wrote:
> Hi All -
>
> I'm happy to announce that Boost will be mentoring 9 Google SoC projects this
> year. They are:
<snip>

>
> Application to extend Boost regex with recursive matching, named
> sub-expressions, and automata-based matching
> Hugh Wimberly John Maddock
<snip>

A DFA or Thompson NFA regex backend would be very nice, but I'm a bit
dubious of the claim that it can be faster than the backtracking engine
in any but the most pathological cases. I've toyed with various
approaches, and communicated with Russ Cox after his excellent article
on the topic: http://swtch.com/~rsc/regexp/regexp1.html. The trouble is
that getting ECMAScript semantics (captures, leftmost biased) saddles
these approaches with bookkeeping that makes the Thompson NFA
considerably slower in the common case than the
dumber-than-a-bag-of-hammers backtracking NFA. I ported Russ' Thompson
NFA-with-captures-and-leftmost-bias to C++/Boost, wrote a Spirit-based
regex parser, separated the mutable state for thread-safety and did an
apples-to-apples shootout against xpressive's backtracking NFA
implementation. The result: for common regexes, xpressive was about 30x
faster. But it's also true that pathological regexes exist that expose
the weaknesses of the backtracking approach. <shrug>

I would be happy to make the Thompson NFA engine publicly available. I'm
sure it can be improved, but I tend to doubt a 30x speedup is possible.

My advise: implement a simple DFA and use it only for patterns without
captures (regex_constants::nosubs) or fancy assertions (eg. look-ahead).
  If you need captures and leftmost-biased semantics, just use the
backtracking NFA.

Oh, and make the DFA so that I can use it in xpressive, too! :-)

> And, if you aren't coming to BoostCon, well, you should ;-)
>
> http://www.boostcon.com/registration

You bet! See you there!

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk