Boost logo

Boost :

Subject: Re: [boost] [algorithm] Review Request: edit_distance
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2014-01-29 09:59:37


Erik Erlandson <eje_at_[hidden]> wrote:
>> - I think it should be possible to relax the requirement for
>> random access iterators. Either continue to store integers
>> and then use std::advance, or store iterators. I'm considering,
>> for example, comparing UTF-8 with an iterator adaptor than
>> changes a random-access iterator over the underlying bytes
>> into a bidirectional iterator over the characters.
>
> I briefly toyed with this idea earlier on, but it seemed (imo)
> like it would undermine the virtues of the algorithm. I guess
> what I mean by that is, Myers does stuff like storing only one
> index in the working vector 'V' for memory efficiency, and so
> we have "j2 = j1-k", which either requires index manipulation,
> or if you wanted to do it via iterators, it is only efficient
> for random-access iterators.

OK, I've had a proper look at this. If sizeof(iterator) ==
sizeof(size_t), then I think you have to store a pair of iterators
which doubles the memory requirements. I was hoping to find that
you could continue to store a single integer and use std::advance
and keep O(ND), but I think it probably becomes O(N D^2).

So probably not worth doing.

>> - Some might prefer a version that doesn't use Boost.Parameter.
>
> I'm not sure how that would look: would it just take all 8
> parameters all the time?

Yes, with defaults.

>> - It would be great to see a benchmark against GNU diff!
>
> Also a good idea, although my impression of that code (which I
> poked around while doing this) is that it isn't factored in a way
> that would make dropping in a replacement very practical. I
> suppose the other option is to write a version of diff using
> edit_distance

Yes, that's exactly the sort of thing I had in mind; something that
would work like diff, though without most of its myriad options.

You could use this as a tutorial, as suggested in another post.

Another tutorial idea is an approximate match / spell check
replacement word suggester, i.e. scan /usr/dict/words and output
words closest to the input by edit distance. You could use the
max_cost feature.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk