[Regex] \r matching \n
Hi, In this regex the \n matches \r. How is this defined / controlled? Is there a way to only match \n? std::string s = "[*] A\rB\n"; s = boost::regex_replace(s, boost::regex(R"(\[\*\](.+?)(\n|$))"), "<li>\\1</li>"); std::println("{}", js_encode(s)); // <li> A</li>\rB\n Regards, -- Olaf
On Fri, Mar 20, 2026, at 8:29 AM, Olaf van der Spek via Boost wrote:
Hi,
In this regex the \n matches \r. How is this defined / controlled? Is there a way to only match \n?
std::string s = "[*] A\rB\n"; s = boost::regex_replace(s, boost::regex(R"(\[\*\](.+?)(\n|$))"), "<li>\\1</li>"); std::println("{}", js_encode(s)); // <li> A</li>\rB\n
Regards,
To me it appears as if `$` might be behaving like that: https://godbolt.org/z/s1zaMqT1q #include <boost/regex.hpp> #include <fmt/ranges.h> #include <span> int main() { std::string s = "A\rB\n"; s = boost::regex_replace(s, boost::regex(R"((.+?)\n)"), "<\\1>"); fmt::print("{}\n", std::span(s)); fmt::print("{::#x}\n", std::span(s)); } Which prints, on my linux box: ['<', 'A', '\r', 'B', '>'] [0x3c, 0x41, 0xd, 0x42, 0x3e] Of course, there might be some Windows quirk involved.
On 20/03/2026 07:29, Olaf van der Spek via Boost wrote:
Hi,
In this regex the \n matches \r. How is this defined / controlled? Is there a way to only match \n?
std::string s = "[*] A\rB\n"; s = boost::regex_replace(s, boost::regex(R"(\[\*\](.+?)(\n|$))"), "<li>\\1</li>"); std::println("{}", js_encode(s)); // <li> A</li>\rB\n
\n is a regular expression meaning "Match any newline character", which happens to include \r and a few other things as well. If you want to match a literal '\n' then you need to escape it and the appropriate regular expression string is therefore R"\\n" HTH, John.
On 3/20/26 20:33, John Maddock via Boost wrote:
On 20/03/2026 07:29, Olaf van der Spek via Boost wrote:
Hi,
In this regex the \n matches \r. How is this defined / controlled? Is there a way to only match \n?
std::string s = "[*] A\rB\n"; s = boost::regex_replace(s, boost::regex(R"(\[\*\](.+?)(\n|$))"), "<li>\\1</li>"); std::println("{}", js_encode(s)); // <li> A</li>\rB\n
\n is a regular expression meaning "Match any newline character", which happens to include \r and a few other things as well. If you want to match a literal '\n' then you need to escape it and the appropriate regular expression string is therefore R"\\n"
He's trying to match the newline character, not '\' followed by 'n'. Which in a raw string would be R"( )", with the actual line break in the raw string. -- Rainer Deyke - rainerd@eldwood.com
On 21/03/2026 15:31, Rainer Deyke via Boost wrote:
On 3/20/26 20:33, John Maddock via Boost wrote:
On 20/03/2026 07:29, Olaf van der Spek via Boost wrote:
Hi,
In this regex the \n matches \r. How is this defined / controlled? Is there a way to only match \n?
std::string s = "[*] A\rB\n"; s = boost::regex_replace(s, boost::regex(R"(\[\*\](.+?)(\n|$))"), "<li>\\1</li>"); std::println("{}", js_encode(s)); // <li> A</li>\rB\n
\n is a regular expression meaning "Match any newline character", which happens to include \r and a few other things as well. If you want to match a literal '\n' then you need to escape it and the appropriate regular expression string is therefore R"\\n"
He's trying to match the newline character, not '\' followed by 'n'. Which in a raw string would be R"( )", with the actual line break in the raw string.
Yes exactly, the regular expression which matches a literal \n is \\n, otherwise \n on it's own is a regular expression operator and not a literal. John.
On Fri, Mar 20, 2026 at 8:34 PM John Maddock via Boost <boost@lists.boost.org> wrote:
On 20/03/2026 07:29, Olaf van der Spek via Boost wrote:
In this regex the \n matches \r. How is this defined / controlled? Is there a way to only match \n?
std::string s = "[*] A\rB\n"; s = boost::regex_replace(s, boost::regex(R"(\[\*\](.+?)(\n|$))"), "<li>\\1</li>"); std::println("{}", js_encode(s)); // <li> A</li>\rB\n
\n is a regular expression meaning "Match any newline character", which happens to include \r and a few other things as well. If you want to match a literal '\n' then you need to escape it and the appropriate regular expression string is therefore R"\\n"
Hi John, Did you mean R"(\\n)" ? Or "\\n" ? Now I'm confused, I can't get \r to match at all. ++ g++ test.cpp -std=c++26 ++ ./a.out ['.', '\r', '\n', '.'] ['\n'] ['.', '\r', 'X', '.'] ['.', '\r', '\n', '.'] ['\\', 'n'] ['.', '\r', 'X', '.'] ['.', '\r', '\n', '.'] ['\\', '\n'] ['.', '\r', 'X', '.'] ['.', '\r', '\n', '.'] ['\\', '\\', 'n'] ['.', '\r', '\n', '.'] ['.', '\r', '\n', '.'] ['\\', '\\', '\n'] ['.', '\r', '\n', '.'] #include <boost/regex.hpp> #include <print> #include <span> void test(std::string s, std::string re){ std::string a = boost::regex_replace(s, boost::regex(re), "X"); std::println("{} {:18} {}", std::span(s), std::span(re), std::span(a)); } int main() { std::string s = ".\r\n."; test(s, "\n"); test(s, "\\n"); test(s, "\\\n"); test(s, "\\\\n"); test(s, "\\\\\n"); } -- Olaf
On 3/25/26 10:22, Olaf van der Spek via Boost wrote:
On Fri, Mar 20, 2026 at 8:34 PM John Maddock via Boost <boost@lists.boost.org> wrote:
On 20/03/2026 07:29, Olaf van der Spek via Boost wrote:
In this regex the \n matches \r. How is this defined / controlled? Is there a way to only match \n?
std::string s = "[*] A\rB\n"; s = boost::regex_replace(s, boost::regex(R"(\[\*\](.+?)(\n|$))"), "<li>\\1</li>"); std::println("{}", js_encode(s)); // <li> A</li>\rB\n
\n is a regular expression meaning "Match any newline character", which happens to include \r and a few other things as well. If you want to match a literal '\n' then you need to escape it and the appropriate regular expression string is therefore R"\\n"
Hi John,
Did you mean R"(\\n)" ? Or "\\n" ?
According to documentation (https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax...): R"( )" = "\n" should match itself, like all characters not in .[{}()\*+?|^$ R"(\n)" = "\\n" should match exactly '\n' and not '\r', just like "\n". The documentation is explicit about this. R"(\r)" = "\\r" should match '\r', as should "\r", for what it's worth. R"(\\n)" = "\\n" should match "\\n", i.e. a backslash followed by 'n'. '$' should match the end of a line, including embedded newlines in the text. It is not clear what qualifies as a newline in this sense, but I'm guessing '\r' might qualify. Any other behavior is either an error in the documentation or a bug in the code. -- Rainer Deyke - rainerd@eldwood.com
On Wed, Mar 25, 2026 at 5:14 PM Rainer Deyke via Boost <boost@lists.boost.org> wrote:
According to documentation (https://www.boost.org/doc/libs/latest/libs/regex/doc/html/boost_regex/syntax...):
R"( )" = "\n" should match itself, like all characters not in .[{}()\*+?|^$
R"(\n)" = "\\n" should match exactly '\n' and not '\r', just like "\n". The documentation is explicit about this.
R"(\r)" = "\\r" should match '\r', as should "\r", for what it's worth.
R"(\\n)" = "\\n" should match "\\n", i.e. a backslash followed by 'n'.
This line isn't correct, is it? Two slashes in the raw string should be four slashes in a normal string.
'$' should match the end of a line, including embedded newlines in the text. It is not clear what qualifies as a newline in this sense, but I'm guessing '\r' might qualify.
Yeah, this makes sense. Also depends on the m modifier:
Normally Boost.Regex behaves as if the Perl m-modifier is on: so the assertions ^ and $ match after and before embedded newlines respectively, setting this flags is equivalent to prefixing the expression with (?-m).
I thought ^ and $ would be begin and end of input by default, but it's the other way around. -- Olaf
On 3/25/26 17:25, Olaf van der Spek via Boost wrote:
R"(\\n)" = "\\n" should match "\\n", i.e. a backslash followed by 'n'.
This line isn't correct, is it? Two slashes in the raw string should be four slashes in a normal string.
You are of course correct. R"(\\n)" = "\\\\n", which matches R"(\n)" = "\\n". -- Rainer Deyke - rainerd@eldwood.com
participants (4)
-
John Maddock -
Olaf van der Spek -
Rainer Deyke -
Seth