|
Boost : |
From: John Maddock (john_at_[hidden])
Date: 2004-11-01 05:59:27
> I've studied the interfaces you suggested, and here are my
> observations. Please correct me if I'm wrong.
>
> <perl_matcher> is a collection of algorithms which sort of hack
> into the underlying <basic_regex>'s internal data structures and
> use them to perform matching or whatever they're up to.
Yes, it's responsible for the actual matching.
> <basic_regex_creator> is a "syntax features to internals"
> converter which is called directly by the parser. In theory,
> implementing it should be the better way to initialize customized
> structures. However, the way it is used is somewhat tricky: it
> fills not its own data structures, but structures of the class
> that called the parser itself---the <basic_regex_implementation>.
> So this one would need reimplementing, too.
>
> Now for the real problem. Both <perl_matcher> and
> <basic_regex_creator> deal with the already compiled state machine
> or its elements. The first one works directly with the regex
> internals, the second one gets <append_state> and similar calls
> from the parser. The trouble is, some of my algorithms'
> calculations have to be performed directly on the expression tree,
> the compiled state machine won't help. Is it possible to restore
> the tree from the information provided by the library? That is,
> given the regex "((?:a|b)*?)(b+)", end up with an object like
>
> new cat(
> new match(
> new kleene_lazy(
> new alt(
> new charset( "a" ),
> new charset( "b" )
> )
> )
> ),
> new match(
> new repeat( new charset( "b" ) )
> )
> )
I see what you mean, no, there never is a parse tree like that: it's never
been necessary (until now obviously).
> And now for something completely different.
>
> The following program outputs ' aa', where the first char is \0.
> If we replace <smatch> by <cmatch>, the output is ok. That holds
> for the regex5 as well as the regex library in boost 1.31.0. Am I
> missing something? (MSVC 7.0)
>
> #include <boost/regex.hpp>
> #include <iostream>
> main() {
> boost::smatch m;
> boost::regex_match( "aaa", m, boost::regex( ".*" ) );
> std::cout << m[ 0 ] << "\n";
> }
Well smatch is the wrong type to use in this situation:
std::string::const_iterator and const char* are not the same type, you
should be using match_results<const char*> in this case, which is the same
type as typedef cmatch (I hope that makes sense).
For conforming compilers, the code you posted does not compile (which is the
correct behaviour): but some workarounds for bugs present in VC6 and VC7
cause a temporary string to be created in this case, and the call to go
through an overload that ideally would not have been found - so the
iterators you get back are iterators into a string that's already been
destroyed.
John.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk