Boost logo

Boost Users :

Subject: Re: [Boost-users] [Regex] Emulate awk
From: Wilde, Donald S (donald.s.wilde_at_[hidden])
Date: 2011-06-24 10:41:42


-----Original Message-----
From: boost-users-bounces_at_[hidden] [mailto:boost-users-bounces_at_[hidden]] On Behalf Of Alessandro Candini
Sent: Friday, June 24, 2011 1:50 AM
To: boost-users_at_[hidden]
Subject: [Boost-users] [Regex] Emulate awk

> I have a string like the following:
>
> string myStr = "Monthly Ecosystem NPP = 0.360591 tDM/ha month";
>
> How can I use boost regex to isolate 0.360591, as I would do in bash
> with a echo $myStr | awk '{ print $5 }' ?
[snip]

Alessandro,

The first shift to make is that with a RegEx you are dealing with recognizing character types, not delimited substrings as with awk. You could do it that way, by recognizing the 4 spaces and ignoring the stuff in-between.

  boost::regex e("^[^\\s]+\\s[^\\s]+\\s[^\\s]+\\s[^\\s]+\\s([^\\s]+)");

... which will create a regex pattern e that starts at the beginning of the string, ignores four batches of non-space stuff followed by spaces, then captures the final non-space batch which is hopefully always your number.

This is easier, just recognizing the batch of numbers with a decimal:

  boost::regex exp("([0-9.]+)");

... will create a pattern called exp that will find some string of digits with periods. If you can be sure that there will only be one floating point non-negative number in the string this is all you need. You can also combine the two approaches.

// code
#include <stdlib.h>
#include <boost/regex.hpp>
#include <string>
#include <iostream>

using namespace boost;

regex exp("[0-9.]+");

int process(std::string aString) {
  cmatch what;
  if (regex_match(aString, what, exp)) {
    return std::atof(what[1].first);
  }
  return -1;
}
// end code

There are many books and web tutorials on Perl-compatible regular expressions, which are the default format. Once you get the hang of the C++ wrappings, you should be ready to study the RegEx syntax and apply it.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net