Boost logo

Boost Users :

From: Renato Golin (renato_at_[hidden])
Date: 2008-02-12 04:41:24


Eugene M. Kim wrote:
>> PS: of course the code above is pseudo-code and of course you'll take as
>> many input parameters as available and not hard-coded like this.
> That would work for simple cases where the two sub-patterns are
> separated by ".*", but fail for other non-trivial expressions such as
> "omg(bbq|cakes)*wtf", which the program must also be able to accept. ;_;

Yeah,

If you're doing full text search it'd be fine but I know what you mean.

When I started with Perl I found out that regular expressions could do
virtually everything... after a while I realised that they were severely
limited and sometimes two (or more) regular expressions were much faster
than a complex one.

There is no magic in regular expressions and people like Larry Wall
worked very hard to optimize it for the generic case but if your case is
different you may find a better way of doing it. The problem is, as John
said well, you'll loose generality. So, at the end of the day, you'll
still have to pre-process your regex to see in which case it fits better
and even that have limited quality (ie. you can't always tell the best
case for any given string).

Both regular expressions you said "omg.*wtf" and "omg(bbq|cakes)*wtf"
are fairly simple and when running over a reasonable amount of data do
perform really well against other approaches (like specific parsers and
state machines).

I'd say you have an external problem, you shouldn't need (or shouldn't
allow the user to) require complex needles or give such a big haystack
and block execution time to avoid breaking your service down for
exceptions like this.

Hope that helps,
--renato

-- 
Reclaim your digital rights, eliminate DRM, learn more at
http://www.defectivebydesign.org/what_is_drm

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net