Boost logo

Boost :

From: Beman Dawes (beman_at_[hidden])
Date: 2000-08-15 10:46:04


Jens Maurer wrote:

>This is to announce that John Maddock's regular expression
>library is under boost review starting today.

I've been looking at regex by remembering past applications where it would
have been advantageous to have had a regular expression library, and then
analyzing regex to see if it would have met the need.

First two challenges I thought of, regex solved the problem directly. In
the case of searching HTML (which is a bit tricky because matches can span
lines) there was even a sample program (snip9.cpp) which demonstrated
usage.

The third challenge is this:

Given short input strings which contains postal addresses ("123 Main St",
"PO Box 123", "123 Highway 12") and a bunch of patterns described as
regular expressions, find which pattern (if any) is the "best"
match. Ignore what "best" means for this discussion.

A regex user could presumably fill a container with reg_exp objects, then
for each input address iterate over the container applying regex_match()
with the reg_exp object. Not particularly difficult to program, AKAIK.

But what if efficiency rears its ugly head? Industrial code I have seen in
effect re-ordered the regular expressions into groups according to common
initial sub-expressions, determined which sub-expression applied, and then
only made the pattern by pattern match attempts on patterns that stood a
chance of matching. The logical extreme of this approach would be a tree
search that did the minimum number of regex_match() calls.

I don't hold it against regex if that isn't supported directly, but I would
like to hear John Maddock (or anyone else) speculate on the relative ease
of implementing this challenge fairly efficiently on top of regex.

Thoughts?

--Beman


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk