Boost logo

Boost Users :

From: David McKelvie (david_at_[hidden])
Date: 2005-03-22 17:21:09


Could anyone explain the following behaviour of regex_search
(In /usr/include/boost/regex/v3/regex_match.hpp, so Version 3)

Using regex_search acting on a pair of iterators, with input
      "aaa bbb ccc\naaaaaaX"
and regexp
      "(aaa)|(bbb)|(.)|(\n)"
and the match_continuous flag set.

It seems that regex_search iterates to the end of the sequence
using ++iterator rather than stopping once it has found the first match
or even the longest match.

Why is this?
    I guess it is an optimisation to speed up subsequent matches.

Is there a way of avoiding this?

I have an application in mind where maybe I dont have the entire sequence
or rather I'd like the match before I have the complete sequence.

   thanks, David

--- details --

Why does regex_search iterate to the end of the sequence
in the following situation, with match_continuous set
It seems to me that it could stop as soon as it finds 'aaa'
and not iterate further.

  expression.assign("(aaa)|(bbb)|(.)|(\n)"); // Regular expression
  const char * input = "aaa bbb ccc\naaaaaaX"; // Input string
  MyCharIter start = my_begin(input),
             end = my_end(input);
  boost::match_results<MyCharIter> what;
  boost::regex::flag_type flags =
     boost::match_default |
     boost::match_not_dot_newline |
     boost::match_continuous ;

  regex_search(start,end, what, expression, flags);

MyCharIter is just a test iterator on the string with

MyCharIter& MyCharIter::operator++(){ // prefix ++X
  printf("NT: ++X called bp=%d c= '%c'\n",bp,s[bp]);
  ++bp;
  return *this;
}

The output is

NT: ++X called bp=0 c= 'a'
NT: ++X called bp=1 c= 'a'
NT: ++X called bp=2 c= 'a'
NT: ++X called bp=0 c= 'a'
NT: ++X called bp=1 c= 'a'
NT: ++X called bp=2 c= 'a'
NT: ++X called bp=3 c= ' '
NT: ++X called bp=4 c= 'b'
NT: ++X called bp=5 c= 'b'
NT: ++X called bp=6 c= 'b'
NT: ++X called bp=7 c= ' '
NT: ++X called bp=8 c= 'c'
NT: ++X called bp=9 c= 'c'
NT: ++X called bp=10 c= 'c'
NT: ++X called bp=11 c= '
'
NT: ++X called bp=12 c= 'a'
NT: ++X called bp=13 c= 'a'
NT: ++X called bp=14 c= 'a'
NT: ++X called bp=15 c= 'a'
NT: ++X called bp=16 c= 'a'
NT: ++X called bp=17 c= 'a'
NT: ++X called bp=18 c= 'X'
NT: ++X called bp=0 c= 'a'
NT: ++X called bp=0 c= 'a'
NT: ++X called bp=1 c= 'a'
NT: ++X called bp=2 c= 'a'
NT: ++X called bp=0 c= 'a'
NT: ++X called bp=0 c= 'a'
NT: ++X called bp=1 c= 'a'
NT: ++X called bp=2 c= 'a'
NT: ++X called bp=0 c= 'a'
NT: ++X called bp=1 c= 'a'
NT: ++X called bp=2 c= 'a'
******** N = 0 Result = aaa


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net