Subject: [boost] [regex, xpressive] interesting(?) perf benchmark
From: Eric Niebler (eric_at_[hidden])
Date: 2010-06-06 23:58:14
I'm always suspect of benchmarks, but I figured I'd post this anyway.
It seems someone wrote a generic automata library and a regex engine on
top of it, and compared its performance to boost.regex, xpressive, pcre,
and greta. Of course, his library wins handily. (The purpose of
benchmarking is to make yourself look good, right? ;-) He doesn't say
that (a) his engine is a DFA, and he's comparing to NFAs; and (b) he's
only implemented a tiny subset of regex features people would expect;
e.g., no captures even, let alone backreferences: things that can be
tricky or impossible to implement with DFAs. He also doesn't say what
version of Boost he's using (tests run Nov. 2009) or whether he's using
static or dynamic xpressive (probably dynamic; static can be ~15% faster).
Anyway, it's not my intention to kick off yet another costly escalation
in the regex war. God help me, I don't have the time. I post it here
because I haven't come across a comparison of these four libraries by a
third party before. Some quick observations:
- I'm pleased that on the balance, xpressive fares well here relative to
the other NFAs. In the end, though, not much separates best from worst.
- All the NFA libraries seem to have large overhead that becomes very
noticeable when matching short strings. I don't know why.
- I'm gobsmacked that my VERY OLD library greta handily beats pcre,
boost.regex /and/ xpressive on short matches. Failing to beat yourself
in any perf benchmark is kind of embarrassing.
- I'm tickled that for long matches, both xpressive and boost.regex
(NFAs) beat his library (DFA). How'd that happen?
- I find it interesting that both xpressive and boost.regex have large
discontinuities in the chart at the bottom; the others don't. Huh.
When more std library vendors start shipping regex libraries, I'll be
interested in seeing this perf matrix expanded.
-- Eric Niebler BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk