Boost logo

Boost Users :

From: Lynn Allan (l_d_allan_at_[hidden])
Date: 2006-04-12 13:48:24


<alert comment="boost newbie">

I've been impressed by the functionality provided by the regex-related
libraries in boost that I've looked at so far. However, before
trekking to far-distant "grok-land" off in the mists, I wanted to get
some idea if there were negligible or minor or major performance
tradeoffs.

I've seen comparisons between regex, spirit, and xpressive (that were
from several years ago and probably obsolete .... done by the library
developers). I'm wondering how these would compare to a "hand-tuned"
state-machine routine, and to an automatically generated state-machine
from a FSM utility (tend to be bloated but can be fast).

My interest is specialized to finding which pattern in a "group" of
patterns was detected, and the offset within the testStr. To
illustrate, the regex would be something like detecting the full or
abbreviated Day-Of-Week:

((Sunday|Sun)|(Monday|Mon)|(Tuesday|Tue)
         |(Wednesday|Wed)|(Thursday|Thu)
         |(Friday|Fri)|(Saturday|Sat))

The testStr is something like:
std::string testStr =
  "Alternate days of the week are Tue and Thursday and Sat and Monday.
"
  "And then Monday and Wed and Friday and Sun. "
  "Near misses are WeD TuE ThU SuN SaT MoN FrI ";

The real application is inputting a batch of 2mb files and generating
SGML-like output with embedded tags. (e.g. enclose Tue in
<dow=2>Tue</dow> and <dow=4>Thursday</dow>)

The above seems like the kind of task for which regex libraries would
be appropriate, would be beyond strstr, but wouldn't be excessively
difficult to accomplish "by hand". The unknown is whether there is a
perfomance trade-off in using a regex library, and whether it is
positive, negative, minor, or major.

I've started some preliminary timings with vc7.1 release /O2 build
with HiResTimer using QueryPerformanceTimer in
ABOVE_NORMAL_PRIORITY_CLASS

Before proceeding much further, this newbie thought it would be good
to check if people with real boost experience have done this kind of
benchmarking. A search for "benchmark" and "timings" in boost-user and
boost-devel didn't turn up much.

Preview: so far the preliminary results look VERY GOOD, but "consider
the source".

</alert>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net