Subject: Re: [boost] [algorithm] Review Request: edit_distance
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2014-01-29 09:59:37
Erik Erlandson <eje_at_[hidden]> wrote:
>> - I think it should be possible to relax the requirement for
>> random access iterators. Either continue to store integers
>> and then use std::advance, or store iterators. I'm considering,
>> for example, comparing UTF-8 with an iterator adaptor than
>> changes a random-access iterator over the underlying bytes
>> into a bidirectional iterator over the characters.
> I briefly toyed with this idea earlier on, but it seemed (imo)
> like it would undermine the virtues of the algorithm. I guess
> what I mean by that is, Myers does stuff like storing only one
> index in the working vector 'V' for memory efficiency, and so
> we have "j2 = j1-k", which either requires index manipulation,
> or if you wanted to do it via iterators, it is only efficient
> for random-access iterators.
OK, I've had a proper look at this. If sizeof(iterator) ==
sizeof(size_t), then I think you have to store a pair of iterators
which doubles the memory requirements. I was hoping to find that
you could continue to store a single integer and use std::advance
and keep O(ND), but I think it probably becomes O(N D^2).
So probably not worth doing.
>> - Some might prefer a version that doesn't use Boost.Parameter.
> I'm not sure how that would look: would it just take all 8
> parameters all the time?
Yes, with defaults.
>> - It would be great to see a benchmark against GNU diff!
> Also a good idea, although my impression of that code (which I
> poked around while doing this) is that it isn't factored in a way
> that would make dropping in a replacement very practical. I
> suppose the other option is to write a version of diff using
Yes, that's exactly the sort of thing I had in mind; something that
would work like diff, though without most of its myriad options.
You could use this as a tutorial, as suggested in another post.
Another tutorial idea is an approximate match / spell check
replacement word suggester, i.e. scan /usr/dict/words and output
words closest to the input by edit distance. You could use the
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk