Boost logo

Boost Users :

From: Jacques-Olivier Goussard (jogoussard_at_[hidden])
Date: 2007-10-30 13:46:15


Thanks for taking the time to answer John.

> How desperate are you? Are there really that many regexes that loading
them
> from their string representation is a problem?

Well, not *that* desperate - I'm looking at the existing options for now.
The number of regexp is roughly 4000 currently but unconstrained a priori
and a lot of them are quite huge. Futhermore, the match is done for all
those regexps on a list of string that can contain around 1 million strings
(so recompilation each time of the regexp is likely to be a problem).

The problem is the following: I'm supposed to implement a match
on general tokens, i.e. being able to code regexps that would
contain tokens like:
(CITY)
where CITY is defined elsewhere as a list of possible cities.
The only way I see to do this with boost-regexp is to translate those
pseudo-regexp
into ones containing
(boston|chicago|.....)
I.e. replace all references to generic tokens to their expanded value - I'm
afraid
that will use too much mem (if all loaded) or too much time (if recompiled
each time) -
unless there is a way to refer to another regexp in a regexp ?
Note that I'm just foreseeing problems here - if you tell me there's no
solution then
I'll implement the expansion and see what the performance look like. Just
trying here
to code it right the 1st time :)

On a side note:
> regex under one locale and read back in under another, sadly bad things
will
> very likely happen :-(

Not if you save the locale in the compiled version and throw if a mismatch
occurs,
but anyway I understand serializing is not an easy thing to do.

Cheers
      /jog



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net