Boost logo

Boost Users :

Subject: Re: [Boost-users] Running Regular expression (RE) over a list, array or map
From: Olivier Austina (olivier.austina_at_[hidden])
Date: 2013-09-27 13:00:33


Hi,

I try to use pattern matching approach for a dataset with multilevel tags.
Suppose we have 2 sets of features: F and H with the following tags F={f0,
f1, f2, f3} and H={h0, h1, h2, h3, h4} to describe a set of elements E={e1,
e2, e3.....en}. We have in a table a sequence of E elements describe as
following:

E e1 e2 e3 e4 e5 e6 e7 e8 ...........en
F *f3 f0 f0 f0* f2 *f3 f0 * f2 f0
H h4 h0 h4 *h1* h0 h2 *h3* h2 h1

The order in which e1, e2 e3...... appear is important as words in a
sentence.

Suppose we have the following RE with f3 and f0 as following: f3f0+ . It
will match the corresponding sequence e1, e2, e3, e4 and e6, e7 in E. Now
we want to add additional constraint as the last f0 should map a h1 in H.
In this case the final result will be only the sequence e1, e2, e3, e4
because in the last sequence e6, e7 the f0 map h3 in H.The final result is
always E sequences but the RE and constraint can be based on E, F or H.
May be it becomes a bit clear. Thank for your help.

Regards
Olivier

2013/9/27 Anthony Foiani <tkil_at_[hidden]>

> Olivier, greetings --
>
> Olivier Austina <olivier.austina_at_[hidden]> writes:
>
> > I am wondering if it is possible to run directly regular expression
> > over a list and getting the indexes (begging and end) of the
> > match. For example, I have a list of strings and a regular
> > expression and I want to know which part of the list matches the RE
> > and get the corresponding indexes in the list.
>
> It looks like you got some other suggestions, but if they're not what
> you're looking for, you might want to clarify your request.
>
> In particular, it's not clear to me whether you want to match the RE
> against each individual item (which can be parallelized, but the
> result is a membership bitmap or subset, not a range), or if you want
> to match the RE against the concatenated value, or the largest span of
> continuous values (which are the requests that make the most sense if
> you want a starting and ending index).
>
> Some sample code would probably clarify things.
>
> E.g., given:
>
> typedef std::vector< std::string > string_vec;
> const string_vec sv{ "foo", "bar", "baz" };
>
> And you wanted to match:
>
> const boost::regex re{ "ba.*" };
>
> What answer do you want to see?
>
> * The membership bitmap would be something like: [ 0, 1, 1 ]
>
> * The subset would be: { "bar", "baz" }
> (This is basically a "grep" operator.)
>
> * The "range" answer would be sv.begin()+1, sv.begin()+3 (since
> "barbaz" matches).
>
> * The other "range" answer would be the same, but because "bar"
> matches, and "baz" matches, so the range represents the (longest?)
> set of elements that individually match the given regex. This
> ends up being something of a meta-regex, or if you prefer, the
> result of searching the concatenated string for instances of
> "(?:re)+".
>
> Or is there some other interpretation that you're trying to get at?
>
> Happy hacking,
> Tony
>
> p.s. Heh. Guess who just finished a few interviews where "spot the
> under-specified problem" was important...
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net