Boost logo

Boost Users :

From: james.jones_at_[hidden]
Date: 2006-08-29 10:21:44


From: "david v" <danova_fr_at_[hidden]>
> Yes i think there were some misunderstanding here.. I think that comes by
> the definition you have of mistake. A mistake for me is as follows:
>
> Regex: "testing"
> String_to_search: "tastung".
> The output should be that the regex testing was found but with 2 mismatches
> that are "a" and "u". So a mismatch is a letter that was not found.
>
> It may sound weird to you but the way i'm using the regex is to identify
> genomic regions, so in other words for biological applications.
> In some cases my regex is a piece of DNA such as "atgcta" and i want to
> search for this regex in another piece of DNA. Given the fact that the regex
> "atgcta" can be found in the genome many times i will get probably get a lot
> of matches. But in some cases i want to be able to search for "atgcta" but i
> want to allow some mismatches. Obviuously i will even get more matches but i
> think regex can be a more much efficient way that by building ip aligment
> matrices.
>
> Any idea how to handle the example above

I don't see how regex can help you here. You need to use more complicated string algorithms. Try this:

http://en.wikipedia.org/wiki/Suffix_tree

- James


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net