Boost logo

Boost Users :

From: Lynn Allan (l_d_allan_at_[hidden])
Date: 2006-04-19 12:37:56


> regex = "(Sunday|Sun)|(Monday|Mon) etc.(Saturday|Sat)";
>
> By-hand parser: Elapsed for 10000 loops Ms: 110.583
> By-hand parser: Elapsed for 100000 loops Ms: 1107.81
>
> re2c generator Elapsed for 10000 loops Ms: 69.4683
> re2c generator Elapsed for 100000 loops Ms: 700.546
>
> Boost::xpressive-static-iterator: 10000 loops Ms: 410.492
> Boost::xpressive-static-iterator: 100000 loops Ms: 4164.45

Eric Niebler wrote:
> Interesting!

I did some HiResTimer comparisons to strstr and wonder if the results
are credible ... the re2c numbers are much closer to strstr than I
expected. (also, these reflect some "tweaking" since the previous
email that recognized DayOfWeek rather than ZipCode, and the numbers
below are about 40% faster than previous)

const char* pzStrToScan_Re2cSearch =
    "12345 at pos=0 and "
    "another zip-code 12345-6789 at pos=36 and "
    "another 98765-4321 at pos=69 and "
    "another at end=113 11223-3445";

const char* pzStrToScan_strstr =
    "12345 at pos=0 and "
    "another zip-code 12345-6789 at pos=36 and "
    "another 12345-4321 at pos=69 and "
    "another at end=113 12345-3445";

WinXp-Sp2 vc7.1 on AMD-3700

10,000 loops thru above to find 40,000 matches
strstr just looking for 12345: 3.2 milliseconds
Re2cSearch looking for [0-9]{5}(-[0-9]{4})? : 5.3 ms

100,000 loops thru above to find 400,000 matches
(mostly to verify that optimizer isn't distorting the results)
strstr just looking for 12345: 31.8 milliseconds
Re2cSearch looking for [0-9]{5}(-[0-9]{4})? : 53.8 ms

The re2c generated code also passes a relatively extensive
cppunit-like test.

> Also, do you think you could send around the code that re2c
> is generating for this expression?

Here is a link to a .zip with the vc6 and vc7.1 projects (vc8 to
follow):
http://cleanspeech.sf.net/misc/re2c_ZipCodeRe_060419.zip

Feedback appreciated, especially if I've messed up and the numbers are
flawed (not unlikely).

(Also, there is some "hold your nose" code in this C prototype. The
intent is to eventually be able to use the relatively simple ZipCodeRe
as a "getting up to speed" example for re2c newbies (like myself), and
perhaps as a template/clone for other 'recognizers'.)


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net