Boost logo

Boost Users :

Subject: Re: [Boost-users] How efficient is the boost::regex library?
From: Eric Niebler (eric_at_[hidden])
Date: 2011-10-29 01:23:51


On 10/28/2011 10:05 AM, John Maddock wrote:
>>> I am looking for the most efficient open-source C++ regex library.
>>>>
>>>> Reading this article:
>>>> http://swtch.com/~rsc/regexp/**regexp1.html>-
>>>> It
>>>> seems that GNU awk is the best overall:
>>>>
http://pdos.csail.mit.edu/~**rsc/regexp-img/grep1p.png>
>>>>
>>>>
>>>
>>> This is all true, but also completely irrelevant. DFA's have good worst
>>> case behaviour, but can be many times slower for common cases.
>>
>>
>> I'd be very interested in seeing the data that show this.
>
> I confess I don't have any myself, but I recall Eric Niebler mentioning
> that he ran some experiments on this, but you know... lies damn lies
> and benchmarks and all that...

I once coded up a Thompson NFA and put it in the old Boost File Vault. I
have no idea whatever became of the file vault, but with a quick google
search, I found the code here:

https://github.com/boost-vault/Strings-Text-Processing/blob/master/thompson-nfa-perl-regex.cpp

This is basically Russ Cox's much-touted-by-him regex implementation,
Boost-ified. Any interested party can confirm for themselves that it is,
in fact, slower than a snail on quaaludes. I asked Russ about that. He
never bothered to reply.

FWIW, in the Alioth language shootout for the regex-dna test, the
fastest submitted C++ program uses xpressive:

http://shootout.alioth.debian.org/u32/program.php?test=regexdna&lang=gpp&id=4

It came in 3rd behind Tcl and Javascript V8, but it uses a fraction of
the memory of those two. This is with gcc. I'm guessing it could have
done better with MSVC or Intel. (But not 4x better, alas.)

DISCLAIMER: This is a very limited test. It would be seriously unwise to
draw any significant conclusions from this.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net