Boost logo

Boost :

Subject: Re: [boost] [gsoc 2013] Approximate string matching
From: Adam D. Walling (adam.walling_at_[hidden])
Date: 2013-04-15 09:11:19


Jan Strnad <hanny.strnad <at> gmail.com> writes:

>
> Dne 12.4.2013 19:50, Michael Marcin napsal(a):
> > On 4/12/13 4:52 AM, Jan Strnad wrote:
...
> >> Now more specifically. I would like to implement approximate string
> >> matching algorithm(s) based on the NFA. I would like to support various
> >> approximate distances, such as Hamming distance [1], Levenstein
distance
> >> [2], Damerau-Levenstein distance [3], Delta distance, Gamma distance
and
> >> finally (delta, gamma) distance.
...
> >
> > Sounds very useful!
> >
> > I guess this would fit in the string algorithm library?
>
> That seems to me reasonable as well.
>
> >
> > What would the interface look like?
> >
> > Is it as simple as:
> >
> > namespace boost {
> > namespace algorithm {
> >
> > template< typename Range1T, typename Range2T >
> > std::size_t levenstein_distance( Range1T first, Range2T second );
> >
> > }
> > }
> >
>
> Sure, that would be possible, but I was originally aiming at something
> little bit different. Basically, I want to design an algorithm that
> would find all (or just the first) occurrences of a given pattern with
> given distance in some string.
>
> So to your question, simple interface could look like this:
> namespace boost
> {
> namespace algorithm
> {
> template <typename T>
> class approximate_distance;
>
> template <typename TRange>
> find_iterator<TRange> approximate_find
> (TRange text,
> approximate_distance<TRange>& model);
> }
> }
...

I would recommend starting off with the basic implementations of these
distance algorithms if possible. There are several situations where I
currently use my own Levenshtein distance implementation, and could use
other alternative 'distance' measures, but the approximate_find is useless
here.

Of course it could be useful in other situations, but I would suggest
exposing the underlying distance algorithms first, so they can then be
adapted into further utilities like approximate_find or used directly as
necessary.

-Adam D. Walling


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk