Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2004-11-01 03:24:39


Vladimir Pozdyayev wrote:
> some of my algorithms'
> calculations have to be performed directly on the expression tree,
> the compiled state machine won't help. Is it possible to restore
> the tree from the information provided by the library? That is,
> given the regex "((?:a|b)*?)(b+)", end up with an object like
>
> new cat(
> new match(
> new kleene_lazy(
> new alt(
> new charset( "a" ),
> new charset( "b" )
> )
> )
> ),
> new match(
> new repeat( new charset( "b" ) )
> )
> )
>

FYI, xpressive builds such a parse tree (see
http://boost-sandbox.sf.net/libs/xpressive). And xpressive's
compile-time design allows for multiple back-ends (NFA, backtracking
NFA, DFA) without paying for ones that aren't used. However, xpressive
is hard-coded to use a backtracking NFA for now, and there is no clean
interface for plugging in a different back-end. It is also not as mature
or as widely used as Boost.Regex. Feel free to contact me offline if you
want more information.

> And now for something completely different.
>
> The following program outputs ' aa', where the first char is \0.
> If we replace <smatch> by <cmatch>, the output is ok. That holds
> for the regex5 as well as the regex library in boost 1.31.0. Am I
> missing something? (MSVC 7.0)
>
> #include <boost/regex.hpp>
> #include <iostream>
> main() {
> boost::smatch m;
> boost::regex_match( "aaa", m, boost::regex( ".*" ) );
> std::cout << m[ 0 ] << "\n";
> }
>

I think there are 2 overloads of the regex_match algorithm coming into
play here. They are:

bool regex_match( std::string const &, smatch &, regex const & );
bool regex_match( char const *, cmatch &, regex const & );

When you pass a smatch, the first overload is getting chosen, so the
string literal is converted into a std::string temporary, which gets
destroyed when the regex_match algorithm returns. That leaves the smatch
object holding invalid iterators, which is why you are seeing garbage.

When you pass a cmatch, the second overload is getting chosen, and no
temporary string object is created -- the cmatch object will hold
pointers into the string literal.

HTH

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk