Boost logo

Boost Users :

From: Christoph Duelli (duelli_at_[hidden])
Date: 2008-05-29 13:03:01


[I am using Boost 1.35, on Linux, running gcc 4.1.2]

I want to parse a string like the following

---
ResA { opt1=>val1, opt2 => val2}, ResB
ResC {opt3=>  valllll  }
   ResD
---
I thought that maybe it is easier to to this using xpressive rather than
Spirit (which I have used for more complicated stuff so far).
I am happy to say, that with the very helpful docs I was able to create a
sregex that parses the above. Also, I found it the solution is quite short,
so I am basically happy with xpressive.
As this was my first try, there is certainly (lots of) room for improvement:
I have attached my program, and would welcome comments (and suggestions for
enhancements.)
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
using namespace boost::xpressive;
using namespace std;
#include <boost/foreach.hpp>
#include <iostream>
#include <map>
int main(int, char **)
{
   string input =
      "ResA { opt1=>val1, opt2 => val2}, ResB\n"
      " ResC {opt3=>  valllll   }\n\n\n"
      "ResD";
   typedef map<string, string> options_t;
   typedef map<string, options_t> resources_t;
   resources_t result;
   // The name of the resource we are parsing now.
   string res_name;
   // Pointer to options (map) of the resource we are parsing now.
   options_t *optsref;
   // Match an option(name) and its value. Both strings,
   // separated by =>, and then stuff the result into the
   // options-map of the resource we are parsing now.
   sregex rx_opt = ( (s1= +_w) >> *_s >> "=>" >> *_s >> (s2= +_w) )
      [ (*ref(optsref)) [s1] = s2 ];
   // A resource has a name and, enclosed in {}, option=>name
   // pairs. We store the name of the resource and the address
   // of its options map.
   sregex rx_res =
         *_s
      >> (s1= +_w)[ref(res_name)=s1, ref(optsref)=&(ref(result)[s1])]
      >> *_s
      >> optional(   '{' >> *_s >> rx_opt
                   >> *(*_s >> ',' >> *_s >> rx_opt)
                   >> *_s >> '}' >> *_s);
   // A line may contains comma separated resource definitions.
   sregex rx_line = rx_res >> * (*_s >> ',' >> *_s >> rx_res);
   // A file consists of multiple lines.
   sregex rx_file = rx_line >> *(*_n >> rx_line) >> *_n;
   if(regex_match(input, rx_file))
   {
      // output the parsed structure.
      cerr << "resname="<<res_name<<endl;
      BOOST_FOREACH(const resources_t::value_type &p, result)
      {
         cerr << p.first << " : " << endl;
         BOOST_FOREACH(const options_t::value_type &o, p.second)
            cerr << "    " << o.first << " => " << o.second << endl;
      }
   }
   else
      cerr << "NO MATCH!" << endl;
   return 0;
}
In particular, I'd like to know:
1) Is it possible to avoid my (ugly) use of "*optsref"?
   I would have liked to write something like
 sregex rx_opt = ( (s1= +_w) >> *_s >> "=>" >> *_s >> (s2= +_w) )
      [ (ref(result[ref(res_name)])) [s1] = s2 ];
   ie nest the maps directly. The way I tried, it would not compile.
2) Can I tell xpressive to allow arbitrary whitespace around ">>"?
   I'd rather avoid cluttering the regexes with all that ">> *_s".
3) If the string does *not* match the sregex, can I find out where the
failure occured (ie what is the length of the prefix that could have been
completed to a succesful match.)?
Thank you and best regards, keep up the good work,
Christoph

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net