Boost logo

Boost :

From: Beman Dawes (beman_at_[hidden])
Date: 2000-08-15 10:46:04

Jens Maurer wrote:

>This is to announce that John Maddock's regular expression
>library is under boost review starting today.

I've been looking at regex by remembering past applications where it would
have been advantageous to have had a regular expression library, and then
analyzing regex to see if it would have met the need.

First two challenges I thought of, regex solved the problem directly. In
the case of searching HTML (which is a bit tricky because matches can span
lines) there was even a sample program (snip9.cpp) which demonstrated

The third challenge is this:

Given short input strings which contains postal addresses ("123 Main St",
"PO Box 123", "123 Highway 12") and a bunch of patterns described as
regular expressions, find which pattern (if any) is the "best"
match. Ignore what "best" means for this discussion.

A regex user could presumably fill a container with reg_exp objects, then
for each input address iterate over the container applying regex_match()
with the reg_exp object. Not particularly difficult to program, AKAIK.

But what if efficiency rears its ugly head? Industrial code I have seen in
effect re-ordered the regular expressions into groups according to common
initial sub-expressions, determined which sub-expression applied, and then
only made the pattern by pattern match attempts on patterns that stood a
chance of matching. The logical extreme of this approach would be a tree
search that did the minimum number of regex_match() calls.

I don't hold it against regex if that isn't supported directly, but I would
like to hear John Maddock (or anyone else) speculate on the relative ease
of implementing this challenge fairly efficiently on top of regex.



Boost list run by bdawes at, gregod at, cpdaniel at, john at