Subject: Re: [boost] [regex, xpressive] interesting(?) perf benchmark
From: Jeroen Habraken (vexocide_at_[hidden])
Date: 2010-06-08 06:09:43
On 7 June 2010 05:58, Eric Niebler <eric_at_[hidden]> wrote:
> I'm always suspect of benchmarks, but I figured I'd post this anyway.
> It seems someone wrote a generic automata library and a regex engine on
> top of it, and compared its performance to boost.regex, xpressive, pcre,
> and greta. Of course, his library wins handily. (The purpose of
> benchmarking is to make yourself look good, right? ;-) He doesn't say
> that (a) his engine is a DFA, and he's comparing to NFAs; and (b) he's
> only implemented a tiny subset of regex features people would expect;
> e.g., no captures even, let alone backreferences: things that can be
> tricky or impossible to implement with DFAs. He also doesn't say what
> version of Boost he's using (tests run Nov. 2009) or whether he's using
> static or dynamic xpressive (probably dynamic; static can be ~15% faster).
> Anyway, it's not my intention to kick off yet another costly escalation
> in the regex war. God help me, I don't have the time. I post it here
> because I haven't come across a comparison of these four libraries by a
> third party before. Some quick observations:
> - I'm pleased that on the balance, xpressive fares well here relative to
> the other NFAs. In the end, though, not much separates best from worst.
> - All the NFA libraries seem to have large overhead that becomes very
> noticeable when matching short strings. I don't know why.
> - I'm gobsmacked that my VERY OLD library greta handily beats pcre,
> boost.regex /and/ xpressive on short matches. Failing to beat yourself
> in any perf benchmark is kind of embarrassing.
> - I'm tickled that for long matches, both xpressive and boost.regex
> (NFAs) beat his library (DFA). How'd that happen?
> - I find it interesting that both xpressive and boost.regex have large
> discontinuities in the chart at the bottom; the others don't. Huh.
> When more std library vendors start shipping regex libraries, I'll be
> interested in seeing this perf matrix expanded.
> Eric Niebler
> BoostPro Computing
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Just as an FYI, there's a set of relatively new libraries from Google,
re2: http://code.google.com/p/re2/, and Irregexp:
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk