Boost logo

Boost Users :

Subject: Re: [Boost-users] Regex problem: cannot parse terms containing OR
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2009-09-26 22:40:08


On Sat, Sep 26, 2009 at 5:04 PM, Ramon F Herrera <ramon_at_[hidden]> wrote:
>
> I have been able to parse stuff more complicated than this, but now I am
> stuck with something seemingly simpler.
>
> The expression being parsed is the common sequence:
>
>  1,2-5,7,8-11
>
> Question 1: Notice my approach. I first match the whole expression, with
> "regex_match", to make sure that it is valid (that works great). Next, I use
> "regex_iterator" to break down the parts. Is that good practice? Am I being
> inefficient/redundant?
>
> Question 2: My code below only extracts "range terms" ("x-y"), for some
> reason I cannot extract "number terms".
>
> As a workaround, I can always feed my data like this:
>
> 1-1,2-5,7-7,8-11
>
> but, after a lot of tries, would love to learn how to do this properly.
>
> TIA,
>
> -Ramon
>
> -----------------------------------------------------
>
> #include <iostream>
> #include <boost/regex.hpp>
> using namespace std;
>
> bool
> term_callback(const boost::match_results<std::string::const_iterator>& what)
> {
>    for (unsigned int i = 0; i < what.size(); i++) {
>        cout << "what[" << i << "]: " <<  what[i].str() << endl;
>        cout << "---------" << endl;
>    }
>    return true;
> }
>
> int
> main(int argc, char *argv[])
> {
>    const char hyphen      = '-';
>    const char left_paren  = '(';
>    const char right_paren = ')';
>    const char bar         = '|';
>    const char comma       = ',';
>    const char star        = '*';
>
>    const string number    = "[0-9]+";
>    const string range     = number + hyphen + number;
>    const string term      = left_paren + number + bar + range + right_paren;
>    const string sequence  = term + bar + left_paren + term + comma +
> right_paren + star + term;
>
>    boost::regex expression(sequence);
>    boost::regex piece(range);
>    boost::cmatch matches;
>
>    char argument[1024];
>    strcpy(argument, argv[1]);
>
>    if (!boost::regex_match(argument, matches, expression)) {
>        cerr << "There is no match" << endl;
>        return 1;
>    }
>
>    string text = argument;
>
>    boost::sregex_iterator m1(text.begin(), text.end(), piece);
>    boost::sregex_iterator m2;
>    for_each(m1, m2, &term_callback);
>
>    return 0;
> }

Do note, if you are wanting to do something with your numbers, like
convert them to numbers and do some operations on them, there is a
much easier way to do this if you use Boost.Spirit2.1 instead of
Boost.Regex. Your problem is more of a parsing problem then a
matching problem, and regex is nice for matching, and Spirit2.1 is
better for parsing. If you are interested then I or someone else
could whip up some code that does the same thing in Spirit2.1, but
will run a whole lot faster and be a lot easier to use.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net