Boost logo

Boost :

From: Stjepan Rajko (stipe_at_[hidden])
Date: 2007-05-12 16:37:33


> Wow that's just the algorithm found on wikipedia, re-written in c++ and
> boostified for introduction in boost/algorithm/string. Nothing fancy
> here, it is definitely basic dynamic programming. My question was more
> to know if there is potential "users" interested and if it's the
> appropiate place to propose it.
>

I know... I was just pointing out that by implementing edit distance,
you're not far from a slew of other useful things. For example, can
easily add support for a ton of related problems (maximum common
subsequence distance, for example), just by allowing the user to
specify how the top row and leftmost column of the dynamic programming
matrix are initialized. In your current implementation, you
initialize d[0][i]=d[i][0]=i: if instead you set d[0][i]=d[i][0]=0 (if
I remember correctly), you get maximum common subsequence distance.
Specifying the cost of insertion / deletion is another useful thing to
allow (hard coded to 1 in your case right now), and providing a custom
comparison cost function between characters is another (right now it
is hard-coded to a cost of 1 if they are different and 0 if they are
equal).

The next layer of difficulty/usefulness is allowing affine
insertion/deletion costs (e.g., an insertion of length N is given a
cost of aN + b), but that requires two tables. And then, you're not
far from a decent probabilistic pattern recognition mechanism :-)

This is all especially useful if you also add traceback extraction,
i.e. extracting the calculation path which yielded the optimal result.
 This gives you the string alignment which yields optimal edit
distance, or yields the maximum common subsequence in the other
initialization case.

I'm not saying you should go ahead and implement all this, but there
is definitelly a part of it that requires a small amount of effort but
makes it a ton more useful - I'd say at least allowing the user to
specify custom insertion, deletion, and comparison function costs, and
extracting the optimal alignment in the end.

Stjepan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk