Boost logo

Boost :

Subject: Re: [boost] RFC: edit_distance / edit_alignment library
From: Erik Erlandson (eje_at_[hidden])
Date: 2013-12-13 21:54:19


project repo: https://github.com/erikerlandson/edit_distance

Since last time:

I implemented specializations for the Myers algorithm, which matches whenever the necessary conditions are met:
*) sequences are random access
*) no substitution (i.e. _allow_sub = boost::false_type(), which is now the default)
*) unit cost for insertion and deletion (also the default)
*) no optional pruning behaviors (excepting the new _max_cost, which is supported)

Default behavior is now such that the Myers algorithm applies, unless options are specifically chosen such that it cannot be used.

I altered the semantic for cost function: a separate equality relation is now applied explicitly to determine equality, which takes precedence over substitution cost function.

optional behaviors are now:

_cost = <cost-function-object> // customize the cost functions
_equal = <equal-object> // customize the element equality relation
_allow_sub = <true/false/true_type/false_type> // enable or disable substitution, at run time or compile time
_max_cost <cost value> // if edit cost grows too large, halt best-path computation and fall back to fast/dumb linear completion
_max_cost_exception // if true, throw an exception when max cost is exceeded instead of completing
_cost_beam = <cost_type value> // prune "unpromising" paths based on high cost per position
_edit_beam = <unsigned int> // prune paths that are too far off the diagonal

If substitution is disabled at compile time (false_type()), then cost functions and the output functions are not required to define substitution methods.

Now that there are two very different algorithms (Myers and SSSP), there is improved unit-testing. Obtaining the same results on both algorithms is an excellent cross-check.

I've had interest in supporting an application related to:
https://github.com/erikerlandson/edit_distance/issues/5
I have a plan for supporting that, which will probably take a bit of time to work out.

I've been considering possible alternate names for the functions:
edit_distance / edit_alignment
or: edit_cost / edit_path
or: edit_cost / edit_script

I've also been considering whether or not to support for _cost_beam and/or _edit_beam. Unsure how much call there is to prune at the expense of correctness.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk