Boost logo

Boost :

Subject: Re: [boost] [gsoc-2013]Approximate string matching
From: Erik Erlandson (eerlands_at_[hidden])
Date: 2013-04-23 11:16:32


----- Original Message -----

From: "Yu Jiang" <sunlightt07_at_[hidden]>
To: boost_at_[hidden]
Sent: Tuesday, April 23, 2013 1:26:35 AM
Subject: [boost] [gsoc-2013]Approximate string matching

I have seen in a previous discussion, another student proposed to implement
some basic matching functions such as "edit distance" for boost. He said by
using dynamic programming, the time complexity is O(m*n). However, if we
give a threshold T and only report the exact edit distance if it's no
larger than T, the complexity can be O(T * min(m,n)). I think sometimes we
just need to decide whether the edit distance is small enough.

....

So, is it compatible for gsoc 2013 and for boost? Looking forward to your
opinions.

I can't speak to gsoc 2013, but as a user of edit distance algorithms in previous lives, I know I'd be interested in a boost edit-distance library.

How one would design the interface for edit_distance<> is an interesting question. In the basic general case, edit_distance<> is a function of two sequences, and a cost matrix for insertion/deletion/substitution/equality. In boost/template world, the "equality" test would of course be a template parameter, with the expected default. These days, I'd expect to be able to provide sequences as either begin/end iterators, or ranges. The cost matrix could be either provided as numeric values or functionals, with defaults to: 1/1/1/0.

I would want to support a variation that limited the search to a diagonal band, either as a separate function or via some parameter.

I imagine that it would be nice to provide specialization for strings, and perhaps vectors, since those are common sequence inputs and it would be convenient to supply them directly.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk