Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2005-09-16 18:46:17


Darren Cook wrote:
>
>>>http://boost-sandbox.sf.net/libs/xpressive/doc/html/xpressive/perf.html
>>>
>>>In short, xpressive comes out consistently ahead of Boost.Regex on
>>>short matches, and roughly on par for longer matches (with wide
>>>variation).
>
>
> Interesting. This left me with two questions:
> 1. Why is dynamic quicker than static xpressive on some expressions?

It's only that way for gcc. On VC7.1, static xpressive is always faster.
I can only guess that gcc's optimizer is at fault here.

> 2. Why is boost::regex quicker on longer strings? Something to do with
> buffering or dynamic memory usage?

I haven't fully investigated this, but I suspect that for some of those
patterns, Boost.Regex is finding a clever optimization. I have noticed
that if you change the pattern:

Tom|Sawyer|Huckleberry|Finn

to:

Tom|Sawyer|.uckleberry|Finn
            ^

then xpressive is considerably faster than Boost.Regex at finding all
matches. Clearly, I need to be testing more patterns to make sure the
results are representative.

> I thought "Huck[[:alpha:]]+" (expressive twice as quick) vs.
> "[[:alpha:]]+ing" (boost::regex twice as quick) was very curious. Is
> this due to some design decision, or just something waiting to be optimized?

This is a case where xpressive is finding a clever optimization that
Boost.Regex is missing. When a pattern begins with a string literal,
xpressive uses Boyer-Moore. It's a huge win.

I have no idea why Boost.Regex is faster at matching "[[:alpha:]]+ing".
It's worth looking in to.

>>>Agreed. FYI, "_" matches any one character. ~_n matches any character
>>>that is not '\n'. I also need to describe _ln which matches a logical
>>>newline (eg., "\n" or "\r" or "\r\n" or other line separators) and
>>>~_ln which matches any one character that is not a line separator.
>
>
> _ln sounds useful. Is that in perl/PCRE ?

I don't recall where I got that idea. Perhaps from Perl 6.

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk