|
Boost Users : |
From: Eric Niebler (eric_at_[hidden])
Date: 2007-10-30 15:08:53
Jacques-Olivier Goussard wrote:
> Thanks for taking the time to answer John.
>
> > How desperate are you? Are there really that many regexes that
> loading them
> > from their string representation is a problem?
>
> Well, not *that* desperate - I'm looking at the existing options for now.
> The number of regexp is roughly 4000 currently but unconstrained a priori
> and a lot of them are quite huge. Futhermore, the match is done for all
> those regexps on a list of string that can contain around 1 million strings
> (so recompilation each time of the regexp is likely to be a problem).
>
> The problem is the following: I'm supposed to implement a match
> on general tokens, i.e. being able to code regexps that would
> contain tokens like:
> (CITY)
> where CITY is defined elsewhere as a list of possible cities.
> The only way I see to do this with boost-regexp is to translate those
> pseudo-regexp
> into ones containing
> (boston|chicago|.....)
> I.e. replace all references to generic tokens to their expanded value -
> I'm afraid
> that will use too much mem (if all loaded) or too much time (if
> recompiled each time) -
> unless there is a way to refer to another regexp in a regexp ?
I'll second the suggestion to look into xpressive. With xpressive, you
can refer to one regex from another. And with the latest version (in the
Boost Sandbox) you can put your list of cities into a symbol table and
get very fast look-up -- much better than just a bunch of alternates.
You can read about xpressive's symbol tables here:
http://boost-sandbox.sourceforge.net/libs/xpressive/doc/html/boost_xpressive/user_s_guide/symbol_tables_and_attributes.html
HTH,
-- Eric Niebler Boost Consulting www.boost-consulting.com
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net