|
Boost Users : |
From: Eric Niebler (eric_at_[hidden])
Date: 2008-03-24 17:53:01
Dilts, Daniel D. wrote:
> I'm trying to tokenize lines of a file using the included static regex. I only care about the tokens indicated by s* = ... When I use the sregex_token_iterator to parse the lines, I only get the last match for s2 and s3.
>
> How should I change things so that I can get every match for s2 and s3 rather than just the last match?
>
> sregex whitespace_regex = *_s;
> sregex line_regex =
> whitespace_regex >>
> (s1 = +_d) >> whitespace_regex >>
> (
> +(
> '\"' >> (s2 = *~as_xpr('\"')) >> '\"' >>
> whitespace_regex >> ':' >> whitespace_regex >>
> '\"' >> (s3 = *~as_xpr('\"')) >> '\"' >>
> whitespace_regex
> )
> |
> +(
> '\"' >> (s2 = *~as_xpr('\"')) >> '\"' >>
> whitespace_regex
> )
> );
Hi, Daniel. If you are using the latest version of xpressive from the
Boost File Vault, I would solve the problem this way:
#include <string>
#include <vector>
#include <iostream>
#include <boost/foreach.hpp>
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
using namespace boost;
using namespace xpressive;
int main()
{
local<std::vector<ssub_match> > strings;
sregex line_regex =
skip(_s) // skip whitespace
(
(s1 = +_d) >>
+(
'\"' >> (s2 = *~as_xpr('\"'))[push_back(strings, s2)] >> '\"' >>
':' >>
'\"' >> (s3 = *~as_xpr('\"'))[push_back(strings, s3)] >> '\"'
)
|
+(
'\"' >> (s2 = *~as_xpr('\"'))[push_back(strings, s2)] >> '\"'
)
)
;
std::string input(" 42 \"The answer to\" : \"Life\" \"The Universe\"
: \"And Everything!\" ");
if(regex_match(input, line_regex))
{
BOOST_FOREACH(ssub_match s, strings.get())
{
std::cout << s << std::endl;
}
}
}
The above uses semantic actions (the parts in []) to push sub-matches
into a vector for reference later. (It also uses skip(_s) to skip
whitespace.)
If you are using xpressive 1.0, which is part of Boost 1.34.1, it would
be a little trickier. There is no skip(), and no semantic actions. If
that's the case, you can define a nested sregex
quoted_string=*~as_xpr('\"');, and use that in your line_regex. Then
every quoted string that matches will cause a nested result to be added
to your match_results. See below:
sregex quoted_string = *~as_xpr('\"');
sregex line_regex =
keep(*_s) >>
(s1 = +_d) >> keep(*_s) >>
(
+(
'\"' >> quoted_string >> '\"' >>
keep(*_s) >> ':' >> keep(*_s) >>
'\"' >> quoted_string >> '\"' >>
keep(*_s)
)
|
+(
'\"' >> quoted_string >> '\"' >>
keep(*_s)
)
);
std::string input(" 42 \"The answer to\" : \"Life\" \"The
Universe\" : \"And Everything!\" ");
smatch what;
if(regex_match(input, what, line_regex))
{
BOOST_FOREACH(smatch const &str, what.nested_results())
{
std::cout << str[0] << std::endl;
}
}
This is less efficient, but gets the job done.
HTH,
-- Eric Niebler Boost Consulting www.boost-consulting.com
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net