Boost logo

Boost :

From: Greg Colvin (gcolvin_at_[hidden])
Date: 2000-08-15 11:57:47


From: "Beman Dawes" <beman_at_[hidden]>
> ...
> The third challenge is this:
>
> Given short input strings which contains postal addresses ("123 Main St",
> "PO Box 123", "123 Highway 12") and a bunch of patterns described as
> regular expressions, find which pattern (if any) is the "best"
> match. Ignore what "best" means for this discussion.
>
> A regex user could presumably fill a container with reg_exp objects, then
> for each input address iterate over the container applying regex_match()
> with the reg_exp object. Not particularly difficult to program, AKAIK.
>
> But what if efficiency rears its ugly head? Industrial code I have seen in
> effect re-ordered the regular expressions into groups according to common
> initial sub-expressions, determined which sub-expression applied, and then
> only made the pattern by pattern match attempts on patterns that stood a
> chance of matching. The logical extreme of this approach would be a tree
> search that did the minimum number of regex_match() calls.

The logical extreme is how I have done this in the past.
In effect you combine all the regular expressions into one
big expression and intrduce some sort of "best fit" metric.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk