Boost logo

Boost :

Subject: Re: [boost] [regex, xpressive] interesting(?) perf benchmark
From: John Maddock (boost.regex_at_[hidden])
Date: 2010-06-07 04:26:00


> I'm always suspect of benchmarks, but I figured I'd post this anyway.
>
> http://astl.sourceforge.net/bench.7.html.

LOL, nice to see my own benchmark suite used against me!

> Anyway, it's not my intention to kick off yet another costly escalation
> in the regex war. God help me, I don't have the time. I post it here
> because I haven't come across a comparison of these four libraries by a
> third party before. Some quick observations:
>
> - I'm pleased that on the balance, xpressive fares well here relative to
> the other NFAs. In the end, though, not much separates best from worst.
>
> - All the NFA libraries seem to have large overhead that becomes very
> noticeable when matching short strings. I don't know why.

I spent some time trying mostly in vain to hammer this down lower with
current Boost.Regex - it turns out that most of it is down to initialisation
of state - Perl-style re's typically have a lot of state to initialise. The
thing to remember is for that short matches against short texts the cost of
a match is basically the cost of a function call for DFA's. Or to put it
another way, even if the NFA's are 10x slower on these short tests, 10x
almost nothing is still almost nothing (though I accept there are certain
outlyer-use-cases where this overhead becomes an issue).

> - I'm gobsmacked that my VERY OLD library greta handily beats pcre,
> boost.regex /and/ xpressive on short matches. Failing to beat yourself
> in any perf benchmark is kind of embarrassing.

Probably more features in xpressive to set up?

> - I'm tickled that for long matches, both xpressive and boost.regex
> (NFAs) beat his library (DFA). How'd that happen?

Maybe reflects the preocupations of their authors?

> - I find it interesting that both xpressive and boost.regex have large
> discontinuities in the chart at the bottom; the others don't. Huh.

Weird indeed, can't really imagine why that would happen, unless there's
some kind of cache miss effect going on?

Cheers, John.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk