Boost logo

Boost :

Subject: Re: [boost] [algorithm] Review Request: edit_distance
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2014-02-03 13:48:58

Hi Erik,

Erik Erlandson <eje_at_[hidden]> wrote:
>> - It may be useful to group contiguous operations of the same
>> kind, i.e. a "run" of N equal characters would result in one
>> call to the output script object, rather than N calls. This
>> can be achieved by merging in the script object, but that
>> would be less efficient than passing the entire run as an
>> iterator pair:
>> struct output {
>> template <typename ITER_T>
>> void insertion(ITER_T begin, ITER_T end) {
>> do_something_with(begin,end);
>> }
>> };
> I've been mulling this over, and I have an idea that is starting
> to grow on me.
> What if we left insertion() and deletion() (and substitution) as-is:
> they continue to receive individual sequence elements, and operation
> scores. But we *do* modify the equality() method to take two ranges,
> in the manner you suggested above. It would introduce an asymmetry
> of equality() with respect to the other methods, but I think there
> are some real arguments for it:


> *) Runs of 'equal' dominate the script in most larger-scale applications
> of interest. Actual insertions, deletions, etc, are relatively small.

I can't agree with that; for me, a common case is where the two inputs
are entirely different.

Here's another possibility: don't pass ranges, but do pass iterators:

struct output {

    ITER pending;
    enum ... last_action;
    void insertion(ITER i, int) {
      if (last_action != insert) {
        pending = i;
        last_action = insert;
    // etc.


The point here is that I just need to store the iterator for the start
of the range until I get a different action; in contrast, if the value
is passed I need to accumulate a copy of the range.

Regards, Phil.

Boost list run by bdawes at, gregod at, cpdaniel at, john at