Boost logo

Boost Users :

From: Sandor (jmzjlgcl_at_[hidden])
Date: 2005-04-20 13:47:10


Hi,

I am using the boost::regexp_iterator to list every match of a user
configured regexp in the contents of a file. The file is read up into a
single string and passed into the constructor of regexp_iterator. Take
the example regexp of "a|\Ab" which means that every 'a' letter will be
a match, plus the 'b' letter should be a match if it is the first one of
the file. But the algorithm does not work like this! The letter 'b' is
matched every time it follows an 'a'. Because '\A' does not mean the
beginning of the original client-code-supplied buffer! After the first
match, the '\A' only means the end of the last match. I can imagine
situations where a metacharater with such meaning is needed, but I would
need a different behaviour. Diving into the boost source code:

template <snip>
class regex_iterator_implementation
{
   <snip>

    bool next()
    {
       if(what.prefix().first != what[0].second)
          flags |= match_prev_avail;
       BidirectionalIterator next_start = what[0].second;
       match_flag_type f(flags);
       if(!what.length())
          f |= regex_constants::match_not_initial_null;
       bool result = regex_search(next_start, end, what, *pre, f);
       if(result)
          what.set_base(base);
       return result;
    }
    <snip>
}

What I would think logical is (note the two new lines):

template <snip>
class regex_iterator_implementation
{
   <snip>

    bool next()
    {
       if(what.prefix().first != what[0].second)
          flags |= match_prev_avail;
       BidirectionalIterator next_start = what[0].second;
       match_flag_type f(flags);
       if(!what.length())
          f |= regex_constants::match_not_initial_null;
       if(base != next_start)
          f |= regex_constants::match_not_bob;
       bool result = regex_search(next_start, end, what, *pre, f);
       if(result)
          what.set_base(base);
       return result;
    }
    <snip>
}

Or at least provide a new flag to enable this behaviour. It is quite
tedious to reimplement regex_iterator in client code just to add this
small feature.
What do you think?

- Sandor


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net