Boost logo

Boost Users :

Subject: Re: [Boost-users] best tool in Boost for (massive) string replacement?
From: Anthony Foiani (tkil_at_[hidden])
Date: 2010-09-25 20:13:07


alfC <alfredo.correa_at_[hidden]> writes:

> My only approach so far is Regex and the implementation is very crude.
> I read the file line by line and do a loop over the replacement keys
> for each line. It is not even exploiting the fact that I have a map of
> replacements (compared to an array of replacements). It seems very
> slow.

Depending on where you want to spend your runtime (setup cost
v. per-line cost), and how much memory you have available...

It might be faster to build a single regex that has all your targets
as alternates, then use the match data to map to the correct replacement.

In Perl, it'd go something like this:

  # establish mapping from target to replacement.
  my %reps = ( '\alpha' => 'a',
               '\beta' => 'b',
               '\gamma' => 'g' );

  # create a regular expression consisting of all targets, using
  # alternation:
  my $re = join '|', map { quotemeta $_ } keys %reps;

  # now loop over the data:
  while ( my $line = <STDIN> )
  {
      # every time the regex matches, capture what matched into $1 and
      # then replace it by looking up the target in the %reps map.
      $line =~ s/($re)/$reps{$1}/g;
      print $line;
  }

A rough translation into Boost can be found here:

   http://scrye.com/~tkil/boost/regex/multi-rep.cpp

It will still fail if any of your target strings contain "\E"
literally in them; I couldn't find any obvious "quotemeta" replacement
in boost::regex.

There are ways to get fancier with it, but I started running into
version incompatibilities. In particular, current implementations of
boost::regex allow the replacement formatter to be any arbitrary
functor, and my subroutine 'replace' turns into this:

  struct find_replacement
  {
      const ssmap & dict_;
      find_replacement( const ssmap & dict ) : dict_( dict ) {}
      const std::string & operator()( const std::string s ) const
        { return dict_.at( s ); }
  };

  const std::string
  replace( const std::string & input,
           const ssmap & dict,
           const boost::regex & re )
  {
      find_replacement fr( dict );
      return boost::regex_replace( input, re, fr );
  }

Happy hacking,
t.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net