Boost logo

Boost Users :

Subject: Re: [Boost-users] [Regex] Is it possible to match any "linebreak"
From: Christoph Duelli (duelli_at_[hidden])
Date: 2012-03-22 06:44:35


John Maddock wrote:

>> Is there a way to have a regex like
>> [[:digit:]]{3}([^\n]+)\n?
>> have \n match any line breaking character?
>> (The processed files might have Unix or DOS line endings.)
>
> Use \R, see:
>
http://www.boost.org/doc/libs/1_49_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.matching_line_endings
>
> HTH, John.

Thank you John.
Somehow I must have overlooked \R in the docs.

However, I still have one issue with \R.
In the following code everything works fine if I use r2.
If r1 the matches' captures do contain the newlines.

In short: ([^\\R+]) does not seem to capture all non-linebreak characters.
Should it? Or is this just a misunderstanding on my part?

#include <boost/test/auto_unit_test.hpp>
#include <boost/test/test_tools.hpp>

#include <boost/regex.hpp>

BOOST_AUTO_TEST_CASE(test_boost_regexp)
{
   // works fine
   boost::regex r1("\\d\\d\\d([^\\R]+)\\R*");
   // boost::regex r2("\\d\\d\\d([^\\r\\n]+)\\R*");
   boost::smatch what;

   std::string input("123hallo welt\n\r");
   BOOST_CHECK(boost::regex_match(input, what, r1));
// ok with r1, but fails with r2: the capture does contain the newline
   BOOST_CHECK_EQUAL(what[1], "hallo welt");

   input="123hallo welt\n";
   BOOST_CHECK(boost::regex_match(input, what, r1));
   BOOST_CHECK_EQUAL(what[1], "hallo welt");

   input="123hallo welt\r";
   BOOST_CHECK(boost::regex_match(input, what, r1));
   BOOST_CHECK_EQUAL(what[1], "hallo welt");
}

Best regards
Christoph


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net