Boost logo

Boost Users :

Subject: Re: [Boost-users] Running Regular expression (RE) over a list, array or map
From: Anthony Foiani (tkil_at_[hidden])
Date: 2013-09-26 23:34:10


Olivier, greetings --

Olivier Austina <olivier.austina_at_[hidden]> writes:

> I am wondering if it is possible to run directly regular expression
> over a list and getting the indexes (begging and end) of the
> match. For example, I have a list of strings and a regular
> expression and I want to know which part of the list matches the RE
> and get the corresponding indexes in the list.

It looks like you got some other suggestions, but if they're not what
you're looking for, you might want to clarify your request.

In particular, it's not clear to me whether you want to match the RE
against each individual item (which can be parallelized, but the
result is a membership bitmap or subset, not a range), or if you want
to match the RE against the concatenated value, or the largest span of
continuous values (which are the requests that make the most sense if
you want a starting and ending index).

Some sample code would probably clarify things.

E.g., given:

  typedef std::vector< std::string > string_vec;
  const string_vec sv{ "foo", "bar", "baz" };

And you wanted to match:

  const boost::regex re{ "ba.*" };

What answer do you want to see?

  * The membership bitmap would be something like: [ 0, 1, 1 ]

  * The subset would be: { "bar", "baz" }
    (This is basically a "grep" operator.)

  * The "range" answer would be sv.begin()+1, sv.begin()+3 (since
    "barbaz" matches).

  * The other "range" answer would be the same, but because "bar"
    matches, and "baz" matches, so the range represents the (longest?)
    set of elements that individually match the given regex. This
    ends up being something of a meta-regex, or if you prefer, the
    result of searching the concatenated string for instances of
    "(?:re)+".

Or is there some other interpretation that you're trying to get at?

Happy hacking,
Tony

p.s. Heh. Guess who just finished a few interviews where "spot the
     under-specified problem" was important...


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net