Boost logo

Boost :

Subject: Re: [boost] [gsoc-2013] proposal for approximate string matching
From: Erik Erlandson (eerlands_at_[hidden])
Date: 2013-04-27 15:16:11


> > I agree to modify the return type for function (4). So now I think your
> suggestion for using a class to supply
> these 3 functions together is a better choice. In case (1)(2)(3), I think
> return size_type seems to be better since
> the algorithm itself can only output an integer as the result?

I think in most cases the result would be integer. I did once implement a variation where I used (1.5) for the cost of substitution, and (1.0) for insertion/deletion. I no longer remember the details, but I do recall that using (2.0) didn't work as well, it wanted to be (1.5). At any rate, the resulting distance in such a case would be floating point, not integer. If it's reasonably easy to support configurable compute/return types, including non-integers, I think it might prove useful.

I know that you are going to be focusing on actual strings for your applications, but there are other kinds of edit distance application, for example a use case like the unix diff command, where you are doing edit-distance-like computations on two sequences of lines of text, and in fact the edit cost between any two lines may itself be computed as an edit distance. So I think designing the routines to be as general as possible (within reason) will pay dividends for the boost community, and improve their adoption.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk