Boost logo

Boost :

Subject: Re: [boost] Interest in some Algorithms
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2016-01-25 13:15:30


Dean Michael Berris wrote:
> You can
> leverage the fact that the ranges are already sorted, using
> std::lower_bound and std::upper_bound to find contiguous ranges that can be
> rotated out of the way incrementally.

Let's see the code. And the benchmarks. And your complexity claims!

The challenge is surely to get linear complexity worst-case, which
you can do with the sort of scan Steven suggested and also has good
cache locality characteristics, while keeping the sublinear complexity
in best cases. Here's the obvious linear algorithm:

template <typename I1, typename I2>
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2)
{
   I1 o = f1;
   I1 i1 = f1;
   I2 i2 = f2;

   while (i1 < l1 && i2 < l2) {
     if (*i1 < *i2) {
       std::swap(*o,*i1);
       ++i1;
       ++o;
     } else if (*i2 < *i1) {
       ++i2;
     } else {
       ++i1;
       ++i2;
     }
   }

   while (i1 < l1) {
     std::swap(*o,*i1);
     ++i1;
     ++o;
   }

   return o;
}

Now I think that you can make that logarithmic in the best case, but without
making it O(N log N) in the worst case like my recursive algorithm did, like
this:

template <typename I1, typename I2>
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2)
{
   I1 o = f1;
   I1 i1 = f1;
   I2 i2 = f2;
   int log_size = log2(l2-f2);

   int n = 0;
   while (i1 < l1 && i2 < l2) {
     if (*i1 < *i2) {
       std::swap(*o,*i1);
       ++i1;
       ++o;
       n = 0;
     } else if (*i2 < *i1) {
       ++i2;
       ++n;
       if (n == log_size) {
         i2 = std::lower_bound(i2,l2,*i1);
         n = 0;
       }
     } else {
       ++i1;
       ++i2;
       n = 0;
     }
   }

   if (i1 == o) return l1;

   while (i1 < l1) {
     std::swap(*o,*i1);
     ++i1;
     ++o;
   }

   return o;
}

That optimises skipping through the second sequence. You could do
something similar for the first sequence in the first if-block but
only for the case where i1==o, and also in the third if-block for
sequences of equal values. (I hadn't previously considered duplicate
values; if these are sets, is that allowed?)

Something else to note is that my recursive algorithm keeps the
to-discard items sorted, so you can actually think of it as a kind of
partition algorithm i.e. both halves of the output are useful. The
linear algorithms don't do that.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk