Boost logo

Boost :

From: David Brownell (brownell_at_[hidden])
Date: 2001-10-29 17:40:29


It would probably be beneficial for me to note that the algorithm I am
working on deferrers from the algorithms mentioned in the link sent. I am
working under the assumption that the second sequence, or destination
sequence, will be somewhat similar to the first, or source, sequence. While
that assumption may kill the generic usefulness of the algorithm, it makes
the problem much more manageable.

So, this would not be a good algorithm for finding a misspelled word in a
large document, but lends itself well to finding the changes in an updated
document or detecting mutations in a genetic sequence.

Thanks,
David Brownell

-----Original Message-----
From: Greg Colvin [mailto:gcolvin_at_[hidden]]
Sent: Saturday, October 27, 2001 7:40 AM
To: boost_at_[hidden]
Subject: Re: [boost] Algo. Proposal: SequenceCompare

From: Beman Dawes <bdawes_at_[hidden]>
> At 10:47 AM 10/26/01 +0400, Vladimir Prus wrote:
> >
> >
> >David Brownell wrote:
> >
> >> I am in the process of writing an algorithm that, when given two
> >> sequences of objects, will compare the differences and, if
> requested, >> output a set of instructions on how to convert sequence
> #1 to >> sequence #2. >
> >[snip]
> >
> >> The most common applications of this algorithm could be file
> >> comparisons or some sort of file versioning system similar to CVS or
> >> SourceSafe, but could even work on more complicated sequences such as
> >> DNA or even a seating chart.
> >>
> >> Is there any interest in this algorithm?
> >
> >I'm interested. Yet, I'll be even more interested if you don't just
> present
> >some algorithm which does the task, but also present a comparison of all
> >possible approaches.
>
> IIRC, the algorithm is called a Levenstein (or possibly Levenshein)
> Match. It was written up some years ago in ACM Computing Surveys in an
> article on approximate string matching. I can dig out the reference if
> anyone cares.

V.I. Levenshtein. Binary codes cabable of correcting, deletions, insertions,
and reversals. Soviet Phys. Dokl. 10:707-710, 1966

This is now called "edit distance", and is but one of a number of
approaches. A good review can be found at the most excellent NEC
ResearchIndex site:

http://citeseer.nj.nec.com/navarro99guided.html

Info: http://www.boost.org Unsubscribe:
<mailto:boost-unsubscribe_at_[hidden]>

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk