Boost logo

Boost Users :

From: david v (danova_fr_at_[hidden])
Date: 2006-08-29 10:02:05


Yes i think there were some misunderstanding here.. I think that comes by
the definition you have of mistake. A mistake for me is as follows:

Regex: "testing"
String_to_search: "tastung".
The output should be that the regex testing was found but with 2 mismatches
that are "a" and "u". So a mismatch is a letter that was not found.

It may sound weird to you but the way i'm using the regex is to identify
genomic regions, so in other words for biological applications.
In some cases my regex is a piece of DNA such as "atgcta" and i want to
search for this regex in another piece of DNA. Given the fact that the regex
"atgcta" can be found in the genome many times i will get probably get a lot
of matches. But in some cases i want to be able to search for "atgcta" but i
want to allow some mismatches. Obviuously i will even get more matches but i
think regex can be a more much efficient way that by building ip aligment
matrices.

ANy idea how to handle the example above

>-----Original Message-----
>From: boost-users-bounces_at_[hidden] [mailto:boost-users-
>bounces_at_[hidden]] On Behalf Of david v
>Sent: Tuesday, August 29, 2006 9:08 AM
>To: boost-users_at_[hidden]
>Subject: [Boost-users] Mismatch and regex newbie problem still problem
>
>So to sum-up.
>If the regex i'm looking for is "testing" and the string to search the
>regex
>for is "tastung" (obviously this is a short example but i'm dealing
with
>more complex regular expressions.
>
>how can i get the number of mismatches. Basically the output of the
>program
>would tell me:
> >2 mismatches found in string "tastung" at position 2 (a) and 5(u).

[Nat] Maybe I'm completely misunderstanding you. If so, please forgive
me.

I think you're saying that you want to start with the regex "testing"
and have the library detect that the string "tastung" is somehow similar
but nonetheless distinct.

My belief is that, given the regex "testing", the library will not
recognize the string "tastung" in any way. It will simply report that no
match was found.

You could construct a more complex regex that would handle this
particular example. You could, for instance, say that you want to match
a "t", followed by an arbitrary character, followed by "st", followed by
another arbitrary character, followed by "ng". The library would report
that the string "tastung" matches that regex, and you could ask it to
tell you the specific substrings matching the variable parts.

But if you want to allow arbitrary variance in any character position --
as long as some other set of character positions matches -- then I'm a
little perplexed as to how to express that in a regular expression.
Maybe an exhaustive family of acceptable alternatives? But if you're
dealing with longer expression strings, that could explode really
quickly.

I think you need to get really specific about the rules you want to use
to detect a "mismatch." Then you need to figure out whether the regex
library is the right tool to help you apply those rules.

Again, sorry if I'm way off base here.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net