Boost logo

Boost :

Subject: [boost] RFC: edit_distance / edit_alignment library
From: Erik Erlandson (eje_at_[hidden])
Date: 2013-06-25 22:38:27


Boost Community,

I cooked up a library that provides a pair of functions: edit_distance() and edit_alignment(). The library currently lives here:
https://github.com/erikerlandson/edit_distance

Some short example programs are here:
https://github.com/erikerlandson/edit_distance/tree/master/examples

A few example expressions:

// the edit distance (aka levenshtein distance)
unsigned dist = edit_distance("abc", "bcd");

// using a custom cost function for insertion, deletion, substitution
unsigned dist = edit_distance("abc", "bcd", custom_cost_function());

// accepts arbitrary sequences, ranges, range adaptors
unsigned dist = edit_distance(as_vector("abc"), as_list("bcd") | boost::adaptors::reversed);

// obtain the edit distance and the list of the edit operation codes (in output iterator)
unsigned dist = edit_alignment("abc", "bcd", std::ostream_iterator<edit_opcode>(std::cout, "")).second;

// adaptors to add edit operation costs and sequence elements to the edit operation output
unsigned dist = acquire<elements>(acquire<costs>(edit_alignment))("abc", "bcd", std::ostream_iterator<boost::tuple<edit_opcode, unsigned, char, char> >(std::cout, "")).second;

I'm interested in community feedback on whether these would be a useful addition to boost::algorithms. I envision these as the first additions to an eventually-larger library of other kinds of string distances, dynamic programming, and other algorithms for sequence alignment.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk