|
Boost Users : |
Subject: Re: [Boost-users] best tool in Boost for (massive) string replacement?
From: Anthony Foiani (tkil_at_[hidden])
Date: 2010-09-25 20:13:07
alfC <alfredo.correa_at_[hidden]> writes:
> My only approach so far is Regex and the implementation is very crude.
> I read the file line by line and do a loop over the replacement keys
> for each line. It is not even exploiting the fact that I have a map of
> replacements (compared to an array of replacements). It seems very
> slow.
Depending on where you want to spend your runtime (setup cost
v. per-line cost), and how much memory you have available...
It might be faster to build a single regex that has all your targets
as alternates, then use the match data to map to the correct replacement.
In Perl, it'd go something like this:
# establish mapping from target to replacement.
my %reps = ( '\alpha' => 'a',
'\beta' => 'b',
'\gamma' => 'g' );
# create a regular expression consisting of all targets, using
# alternation:
my $re = join '|', map { quotemeta $_ } keys %reps;
# now loop over the data:
while ( my $line = <STDIN> )
{
# every time the regex matches, capture what matched into $1 and
# then replace it by looking up the target in the %reps map.
$line =~ s/($re)/$reps{$1}/g;
print $line;
}
A rough translation into Boost can be found here:
http://scrye.com/~tkil/boost/regex/multi-rep.cpp
It will still fail if any of your target strings contain "\E"
literally in them; I couldn't find any obvious "quotemeta" replacement
in boost::regex.
There are ways to get fancier with it, but I started running into
version incompatibilities. In particular, current implementations of
boost::regex allow the replacement formatter to be any arbitrary
functor, and my subroutine 'replace' turns into this:
struct find_replacement
{
const ssmap & dict_;
find_replacement( const ssmap & dict ) : dict_( dict ) {}
const std::string & operator()( const std::string s ) const
{ return dict_.at( s ); }
};
const std::string
replace( const std::string & input,
const ssmap & dict,
const boost::regex & re )
{
find_replacement fr( dict );
return boost::regex_replace( input, re, fr );
}
Happy hacking,
t.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net