Boost logo

Boost :

Subject: Re: [boost] RFC: edit_distance / edit_alignment library
From: Erik Erlandson (eje_at_[hidden])
Date: 2013-06-30 20:43:09


> I have had a quick look. It seems that you've implemented the
> Needleman-Wunsch
> algorithm, which takes quadratic time and space. There are less
> complex algorithms,
> at least for similar input sequences, e.g. Myers - "An O(ND) Difference
> Algorithm
> and Its Variations"; I have an implementation of that here:
> http://svn.chezphil.org/anyterm/trunk/src/diff.cc & .hh.
>
> What is your motivation for choosing Needleman-Wunsch?

It was the simplest algorithm to get going. I was interested in developing a nice boost-ish programmatic interface, and developing unit tests, so for the implementation I stuck with the easy algorithm that I was familiar with. Now I (or anybody else) can drop in better algorithms with some unit test assurance.

Plus, Needleman-Wunsch is pretty effective for applications involving relatively small strings or sequences. Not so much for applications like large file diffs, or DNA sequences, etc.

At any rate, I view the particular algorithm as somewhat fungible, provided that the interface design is appealing to people. If there is disagreement on that point, I'm interested to know that too.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk