Boost logo

Boost Users :

From: Bruce Adams [TSP Sunbury] (bruadams_at_[hidden])
Date: 2003-03-21 09:30:21


>-----Original Message-----
>From: John Maddock [mailto:john_maddock_at_[hidden]]
>Sent: 20 March 2003 12:48
>To: Boost-Users_at_[hidden]
>Subject: Re: [Boost-Users] Regex++ newbie problems
>
>
>> I've just started using Regex++ (from boost 1.29.0)
>> and I'm experiencing some strangeness that don't seem to be
>mentioned
>> in
>the
>> faq.
[snip]
>Word_expression("([:punct::space:]*)([-:upper::lower:^[:punct::
>space:]]+)([:
>> punct::space:]*)");
>>
>> Is it right that 'bad' expressions should coredump?
>
>boost::regex will through an exception if you pass it an
>invalid expression - you need to catch it or else yes your
>program will core dump.
>
Hi,
  Thanks. I knew it was a newbie error. I guess I was working
too late and reading only every other line in the manual. I tell
from my bad grammar. I'll try and save face by attempting to make
a useful contribution. :-)
  Is there a debug variant or third party program that can be used
to generate a useful syntax error? Note: I haven't looked at the
exception generated yet but I would assume (incorrectly?)
that it does not give much detail about the error.
Though if you write and use expressions cleanly it should be trivial
to find them by inspection.
   
>It's an invalid expression because:
>
>[:punct::space:]* should be [[:punct:][:space:]]*
>
>and
>
>[-:upper::lower:^[:punct::space:]] you can't nest character
>classes like that (in any regular expression language that I know of).
>
Bah. Newbie error.
I only tried it "straw clutching mode" because of my earlier error.

>
>> (as an aside maybe we could catch bad ones better by replacing regex
>strings
>> with
>> overloaded operators the way streams have superceded printf)
>>
Apologies for being slightly off topic for the users group.
Has there been any work in this direction? We want to compile the
expression for efficiency reasons. Building them up using operator<<
might sacrifice this without an additional "reduction" phase to compile
down to the most efficient automata. I guess this would be the regexp
equivalent to endl.
Still I like the idea as a debugging tool and I don't think the efficiency
lost would be prohibitive.
Regular expressions being more like trees rather than streams the << syntax
might get a bit ugly with all the brackets required.
E.g.

char_class Punct = regex::PUNCTUATION + regex::SPACE

reg_exp WordExpression = Kleene_closure(Punct) +
                            Positive_closure(char_class("-") + regex::ALPHA)
+
                         Kleene_closure(Punct);

I guess that is pretty ugly compared to the conventional syntax, despite
the improved checkability.

I meant to write something like that years ago but never found the time
and did have working regexp library around.

How about the equivalent using some hidden template metaprogramming
(for use when the expression is fixed at compile time)
I have a feeling that the complexity added relative to the minor
inconvenience of setting up the expression on start-up outweigh the
benefits.

Still I would be interested to read about research in this area
(i.e. tree syntax & compile time compilation of regular expressions
      in C++ or other languages).

With even the most elegant design there's usually a way it can be
improved if you look hard enough.
(perhaps that improved should be in quotes :-) I feel quite sincere
 about both interpretations)

>>
>> I found I still get rogue matches on punctuation and spaces
>when I use
>> the manually expanded form below:
>
>You are using the member first of boost::match_results as a
>null terminated string - it is *Not* a copy of the string
>matched or a null terminated string it is an iterator into
>your text - either use the sequence (first-second), or call
>match_results::str() to get a std::string object.
>
>John.
>
Bah, I even remember reading that (and doing it the first time).
Sorry for time wasting and thanks again.

Regards,
          Bruce A.

============================================================================
 Any opinions expressed in this e-mail are those of the individual and not
 necessarily those of Tyco Safety Products.

 Any prices for the supply of goods or services are only valid if supported
 by a formal written quotation.

 This e-mail and any files transmitted with it, including replies and
 forwarded copies (which may contain alterations) subsequently transmitted
 from Tyco Saftey Products are confidential and solely for the use
 of the intended recipient.

 If you are not the intended recipient or the person responsible for
 delivery to the intended recipient, be advised that you have received this
 e-mail in error and that any use is strictly prohibited. In this event,
 please notify us via e-mail at 'helpdesk.tepg_at_[hidden]' or telephone on
 0121 255 6499 and then delete the e-mail and any copies of it.
============================================================================


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net