Boost logo

Boost :

From: JD (jean.daniel.michaud_at_[hidden])
Date: 2007-05-15 08:22:14


Phil,

> JD wrote:
>> I would like to know if there is any interest in an implementation of
>> the edit distance algorithm
>
> I have an implementation of the "diff" (i.e. edit script) algorithm here:
>
> http://svn.anyterm.org/anyterm/trunk/common/diff.hh
> http://svn.anyterm.org/anyterm/trunk/common/diff.cc
>
> The "edit distance" is in there somewhere.
>
> I would welcome this sort of thing in Boost. But it needs to be the
> best-known algorithm. IIRC, there was one improvement that I found in
> the literature but didn't implement because I didn't fully understand it.
>
> Please correct me if I'm wrong, but I get the impression that your code
> is O(NM) where N and M are the sizes of the two input strings. The
> algorithm that I implemented is O((N+M)D) where D is the edit distance
> in the worst case, and typically O(N+M+D^2). The algorithm is
> described in the .cc file linked above. Also, your code seems to have
> O(NM) space requirement; I'm not sure what the lower bound is, but I
> think it's probably O(N+M).
>

I improved the algorithm and now the space requirements falls to O(m).
Computational is still the same, however. Having a very quick look at
your algo (in diff.cc) I can see it's more complex in a way that it is
not understandable at first read. As I can see in the Boost Guideline
(http://www.boost.org/more/lib_guide.htm#Guidelines):

"Aim first for clarity and correctness; optimization should be only a
secondary concern in most Boost libraries."

So I think unless there is an obvious way to reduce computational cost
to O(m) (the same way it was obvious to reduce space usage to the same),
I think we should stick to the original algorithm.

Thanks for your insights and further remarks,

Pavol,

   Did you have time to take a look? If not, I've just uploaded the new
version to the Vault here (http://minilien.com/?gSIPLYyR08).

Thanks for your remarks,

JD


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk