|
Boost Users : |
From: John Maddock (john_maddock_at_[hidden])
Date: 2003-03-20 07:48:01
> I've just started using Regex++ (from boost 1.29.0)
> and I'm experiencing some strangeness that don't seem to be mentioned in
the
> faq.
>
> Firstly I found that [-A-Za-z]+ matched spaces and punctuation characters
> unexpectedly
> rather than plain alphabetic characters and hyphens only as desired.
> Reading the documentation I altered this to [-:alpha:] & [-:upper::lower]
> with no
> effect. So I decided to experiment with adding ^[:space:].
> When finally I reached the expression below I got a coredump where the
> expression
> was declared.
> The intention of this expression was to strip and keep leading and
trailing
> punctuation and
> spaces as well as extracting a word from the middle.
>
> static const boost::regex
>
>
Word_expression("([:punct::space:]*)([-:upper::lower:^[:punct::space:]]+)([:
> punct::space:]*)");
>
> Is it right that 'bad' expressions should coredump?
boost::regex will through an exception if you pass it an invalid
expression - you need to catch it or else yes your program will core dump.
It's an invalid expression because:
[:punct::space:]* should be [[:punct:][:space:]]*
and
[-:upper::lower:^[:punct::space:]] you can't nest character classes like
that (in any regular expression language that I know of).
> And if so in what way is the above expression bad?
> (as an aside maybe we could catch bad ones better by replacing regex
strings
> with
> overloaded operators the way streams have superceded printf)
>
>
> I found I still get rogue matches on punctuation and spaces when I use the
> manually expanded
> form below:
You are using the member first of boost::match_results as a null terminated
string - it is *Not* a copy of the string matched or a null terminated
string it is an iterator into your text - either use the sequence
(first-second), or call match_results::str() to get a std::string object.
John.
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net