Boost logo

Boost Users :

From: Phil Hystad (phystad_at_[hidden])
Date: 2008-03-12 12:43:01


In follow up to the message and response quoted below. Boost regex
seems
to work fine on Mac OS X and on our Linux platforms. But, on Windows
32 bit
we have the following situation. Note this message is a little bit on
the long side given that I am including a short program and the output
from running on Windows and Linux platforms.

The brief program shown below illustrates this problem. The results
are from the Linux and Windows 32-bit machine. You can see on Windows
when using the Posix API, I get the right offset only if I use
boost::REG_PERL or boost::REG_PERLEX. On Linux, it works fine for all
flags.

Program
----------
#include <boost/regex.hpp>
#include <boost/regex.h>
#include <string>
#include <iostream>

using namespace std;

static const char* szPattern="[A-Z][a-z]*";
static const char* szString="small is Great for the Big and Tall";

void f1_(boost::regex::flag_type flag, const char* flag_str)
{
   cout << "\nUsing boost::regex, flag=" << flag << " (" << flag_str
<< ")" << endl;
   std::string s = szString;
   boost::regex re(szPattern, flag);
   boost::match_results<std::string::const_iterator> what;
   boost::regex_search(s, what, re);
   std::cout << "pos=" << what.position() << " len=" << what.length()
<< std::endl;
}

void f2_(int flag, const char* flag_str)
{
   cout << "\nUsing Posix, flag=" << flag << " (" << flag_str << ")"
<< endl;
   regex_t pattern;
   int x = regcomp(&pattern, szPattern, flag);
   if ( x != 0 ) { std::cout << "regcomp - error" << std::endl;
return; }
   regmatch_t matches[5];
   x = regexec(&pattern, szString, 5, matches, 0);
   if ( x != 0 ) { std::cout << "regexec - error" << std::endl;
return; }
   std::cout << "matches[0].rm_so=" << matches[0].rm_so << std::endl;
   std::cout << "matches[0].rm_eo=" << matches[0].rm_eo << std::endl;
}

#define f1(x) f1_(x, #x)
#define f2(x) f2_(x, #x)

int main()
{
   cout << "Regex=" << szPattern << endl;
   cout << "Input=" << szString << endl;

   f1(boost::regex::normal);
   f1(boost::regex::basic);
   f1(boost::regex::extended);
   f1(boost::regex::awk);
   f1(boost::regex::grep);
   f1(boost::regex::egrep);
   f1(boost::regex::sed);
   f1(boost::regex::perl);
   f2(0); // default
   f2(boost::REG_EXTENDED);
   f2(boost::REG_BASIC);
   f2(boost::REG_PERL);
   f2(boost::REG_AWK);
   f2(boost::REG_GREP);
   f2(boost::REG_EGREP);
   f2(boost::REG_PERLEX);
   return 0;
}

Output From Windows
--------------------------
Regex=[A-Z][a-z]*
Input=small is Great for the Big and Tall

Using boost::regex, flag=0 (boost::regex::normal)
pos=9 len=5

Using boost::regex, flag=2162689 (boost::regex::basic)
pos=0 len=5

Using boost::regex, flag=2163456 (boost::regex::extended)
pos=0 len=5

Using boost::regex, flag=2097920 (boost::regex::awk)
pos=0 len=5

Using boost::regex, flag=2293761 (boost::regex::grep)
pos=0 len=5

Using boost::regex, flag=2294528 (boost::regex::egrep)
pos=0 len=5

Using boost::regex, flag=2162689 (boost::regex::sed)
pos=0 len=5

Using boost::regex, flag=0 (boost::regex::perl)
pos=9 len=5

Using Posix, flag=0 (0)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=1 (boost::REG_EXTENDED)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=0 (boost::REG_BASIC)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=2817 (boost::REG_PERL)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=513 (boost::REG_AWK)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=1024 (boost::REG_GREP)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=1025 (boost::REG_EGREP)
matches[0].rm_so=0
matches[0].rm_eo=5

Using Posix, flag=2048 (boost::REG_PERLEX)
matches[0].rm_so=9
matches[0].rm_eo=14

LINUX (Redhat) Output
----------------------------
Regex=[A-Z][a-z]*
Input=small is Great for the Big and Tall

Using boost::regex, flag=0 (boost::regex::normal)
pos=9 len=5

Using boost::regex, flag=2162689 (boost::regex::basic)
pos=9 len=5

Using boost::regex, flag=2163456 (boost::regex::extended)
pos=9 len=5

Using boost::regex, flag=2097920 (boost::regex::awk)
pos=9 len=5

Using boost::regex, flag=2293761 (boost::regex::grep)
pos=9 len=5

Using boost::regex, flag=2294528 (boost::regex::egrep)
pos=9 len=5

Using boost::regex, flag=2162689 (boost::regex::sed)
pos=9 len=5

Using boost::regex, flag=0 (boost::regex::perl)
pos=9 len=5

Using Posix, flag=0 (0)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=1 (boost::REG_EXTENDED)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=0 (boost::REG_BASIC)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=2817 (boost::REG_PERL)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=513 (boost::REG_AWK)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=1024 (boost::REG_GREP)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=1025 (boost::REG_EGREP)
matches[0].rm_so=9
matches[0].rm_eo=14

Using Posix, flag=2048 (boost::REG_PERLEX)
matches[0].rm_so=9
matches[0].rm_eo=14

> Message: 4
> Date: Mon, 10 Mar 2008 18:08:16 -0000
> From: "John Maddock" <john_at_[hidden]>
> Subject: Re: [Boost-users] REG_PERLEX
> To: <boost-users_at_[hidden]>
> Message-ID: <00a201c882d9$bab38360$83d56b51_at_fuji>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> reply-type=original
>
> Phil Hystad wrote:
>> Does anyone know the definition of REG_PERLEX?
>>
>> I am using the regex/regcomp traditional unix/posix API supported by
>> Boost Regular Expression library. On a Windows 32 bit platform we
>> are
>> forced to use REG_PERLEX on the regcomp flags argument whereas for
>> the
>> same application we get by using a zero flag value on regcomp on
>> platforms: Mac OS X and Linux.
>
> REG_PERLEX allows the engine to accept Perl style regular
> expressions - what
> kind of expressions are you using, and what differences do you
> observe on
> the different platforms - there shouldn't really be any difference in
> behaviour.
>
> John.



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net