Boost logo

Boost Users :

Subject: Re: [Boost-users] Regex Design trap
From: John Maddock (boost.regex_at_[hidden])
Date: 2010-09-08 13:20:01


> Dear Developers,
>
> I found a possible trap in the design of the syntax of the Regex library.
>
> Consider the following code:
> std::string text( "blabla123xyz" );
> boost::regex expression( "\\w+(\\d+)\\w+" );
> boost::smatch matches;
> boost::regex_search( text, matches, expression );
> text = "asdfghjkl";
> std::string value = matches[1];
>
> Although this code is not very useful, it can lead to inpredictable
> behaviour.
> As far as i know the matches just reference the string position in the
> original string. so when the string is changed the matches don't fit any
> more. This may be a quite good performance but it requires to be very
> careful. Especially if the string is just referenced somewhere and the
> matches are given to somewhere else.

As you say, it's performance related - had match_results copied the string
the cost would be at least 10 times the normal cost of a call to
regex_search (all due to the memory allocations). You also lose positional
information if you store copies rather than iterators.

> Furthermore as i saw the Regex library I wondered about its interface. It
> seems more like a C library interface than C++ code. I also code in Ruby
> and
> the Regex class is much more convenient. The pattern matching is done
> there
> by a method of class Regex and returns the matches:
> expression = Regex.new( "\w+(\d+)\w" )
> matches = expression.match( "blabla123xyz" )
> if ( matches ) ...
>
> Would it be possible to implement such a more object oriented interface to
> boost::regex?

Sigh... you mean like the deprecated RegEx class:
http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/deprecated_interfaces/old_regex.html

The current interface is closely modeled on the C++ standard library, and of
course will *be part of the next C++ standard*. The idea is that objects
store data, and free functions operate upon them (as with the standard
library containers and algorithms for example). One advantage of this
approach is that the user can extend the range of operations available,
something that is basically impossible with a "closed" OO design where
everything is in the class. For example one could easily define a new
variation on regex_replace that performed a customized replace operation.

HTH, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net