 # Boost :

Subject: Re: [boost] Interest in some Algorithms
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2016-01-26 22:50:57

> On 26 Jan 2016, at 05:15, Phil Endecott <spam_from_boost_dev_at_[hidden]> wrote:
>
> Dean Michael Berris wrote:
>> You can
>> leverage the fact that the ranges are already sorted, using
>> std::lower_bound and std::upper_bound to find contiguous ranges that can be
>> rotated out of the way incrementally.
>
> Let's see the code. And the benchmarks. And your complexity claims!
>

Indeed. Iâ€™ll go through my employerâ€™s patching process for this. :)

> The challenge is surely to get linear complexity worst-case, which
> you can do with the sort of scan Steven suggested and also has good
> cache locality characteristics, while keeping the sublinear complexity
> in best cases.

I think if it was possible to do a specialisation on taking forward iterators, the linear algorithm would be fine. I've forgotten about whether this is possible or even a good idea to do without concept support in the language yet.

> Here's the obvious linear algorithm:
>
> template <typename I1, typename I2>
> I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2)
> {
> I1 o = f1;
> I1 i1 = f1;
> I2 i2 = f2;
>
> while (i1 < l1 && i2 < l2) {
> if (*i1 < *i2) {
> std::swap(*o,*i1);
> ++i1;
> ++o;
> } else if (*i2 < *i1) {
> ++i2;
> } else {
> ++i1;
> ++i2;
> }
> }
>
> while (i1 < l1) {
> std::swap(*o,*i1);
> ++i1;
> ++o;
> }
>
> return o;
> }
>
>
> Now I think that you can make that logarithmic in the best case, but without
> making it O(N log N) in the worst case like my recursive algorithm did, like
> this:
>
>
> template <typename I1, typename I2>
> I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2)
> {
> I1 o = f1;
> I1 i1 = f1;
> I2 i2 = f2;
> int log_size = log2(l2-f2);
>
> int n = 0;
> while (i1 < l1 && i2 < l2) {
> if (*i1 < *i2) {
> std::swap(*o,*i1);
> ++i1;
> ++o;
> n = 0;
> } else if (*i2 < *i1) {
> ++i2;
> ++n;
> if (n == log_size) {
> i2 = std::lower_bound(i2,l2,*i1);
> n = 0;
> }
> } else {
> ++i1;
> ++i2;
> n = 0;
> }
> }
>
> if (i1 == o) return l1;
>
> while (i1 < l1) {
> std::swap(*o,*i1);
> ++i1;
> ++o;
> }
>
> return o;
> }
>
>
> That optimises skipping through the second sequence. You could do
> something similar for the first sequence in the first if-block but
> only for the case where i1==o, and also in the third if-block for
> sequences of equal values. (I hadn't previously considered duplicate
> values; if these are sets, is that allowed?)
>

I had been working on the assumption that sets were allowed to have multiple contiguous runs of similar elements. If this wasn't the case then indeed it would be simpler (it would obviate the need to use std::upper_bound at least from my implementation).

> Something else to note is that my recursive algorithm keeps the
> to-discard items sorted, so you can actually think of it as a kind of
> partition algorithm i.e. both halves of the output are useful. The
> linear algorithms don't do that.
>

Right, it's a form of stable partition. Maybe it should be called a "stable_set_inplace_difference", and that would certainly be useful.

Thanks Phil!