Boost logo

Boost Users :

From: Eric Niebler (eric_at_[hidden])
Date: 2008-03-24 17:53:01


Dilts, Daniel D. wrote:
> I'm trying to tokenize lines of a file using the included static regex. I only care about the tokens indicated by s* = ... When I use the sregex_token_iterator to parse the lines, I only get the last match for s2 and s3.
>
> How should I change things so that I can get every match for s2 and s3 rather than just the last match?
>
> sregex whitespace_regex = *_s;
> sregex line_regex =
> whitespace_regex >>
> (s1 = +_d) >> whitespace_regex >>
> (
> +(
> '\"' >> (s2 = *~as_xpr('\"')) >> '\"' >>
> whitespace_regex >> ':' >> whitespace_regex >>
> '\"' >> (s3 = *~as_xpr('\"')) >> '\"' >>
> whitespace_regex
> )
> |
> +(
> '\"' >> (s2 = *~as_xpr('\"')) >> '\"' >>
> whitespace_regex
> )
> );

Hi, Daniel. If you are using the latest version of xpressive from the
Boost File Vault, I would solve the problem this way:

#include <string>
#include <vector>
#include <iostream>
#include <boost/foreach.hpp>
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
using namespace boost;
using namespace xpressive;

int main()
{
   local<std::vector<ssub_match> > strings;

   sregex line_regex =
     skip(_s) // skip whitespace
     (
       (s1 = +_d) >>
      +(
         '\"' >> (s2 = *~as_xpr('\"'))[push_back(strings, s2)] >> '\"' >>
         ':' >>
         '\"' >> (s3 = *~as_xpr('\"'))[push_back(strings, s3)] >> '\"'
       )
       |
      +(
         '\"' >> (s2 = *~as_xpr('\"'))[push_back(strings, s2)] >> '\"'
       )
     )
   ;

   std::string input(" 42 \"The answer to\" : \"Life\" \"The Universe\"
: \"And Everything!\" ");
   if(regex_match(input, line_regex))
   {
     BOOST_FOREACH(ssub_match s, strings.get())
     {
       std::cout << s << std::endl;
     }
   }
}

The above uses semantic actions (the parts in []) to push sub-matches
into a vector for reference later. (It also uses skip(_s) to skip
whitespace.)

If you are using xpressive 1.0, which is part of Boost 1.34.1, it would
be a little trickier. There is no skip(), and no semantic actions. If
that's the case, you can define a nested sregex
quoted_string=*~as_xpr('\"');, and use that in your line_regex. Then
every quoted string that matches will cause a nested result to be added
to your match_results. See below:

   sregex quoted_string = *~as_xpr('\"');
   sregex line_regex =
     keep(*_s) >>
     (s1 = +_d) >> keep(*_s) >>
     (
      +(
         '\"' >> quoted_string >> '\"' >>
         keep(*_s) >> ':' >> keep(*_s) >>
         '\"' >> quoted_string >> '\"' >>
         keep(*_s)
       )
       |
      +(
         '\"' >> quoted_string >> '\"' >>
         keep(*_s)
       )
     );

     std::string input(" 42 \"The answer to\" : \"Life\" \"The
Universe\" : \"And Everything!\" ");
     smatch what;
     if(regex_match(input, what, line_regex))
     {
         BOOST_FOREACH(smatch const &str, what.nested_results())
         {
             std::cout << str[0] << std::endl;
         }
     }

This is less efficient, but gets the job done.

HTH,

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net