A Proposal to Add Combination Against Permutation to the Library
Technical Report
I. Motivation
The motivation of this article is to provide an ability which can
generate combination like std::permutation. It can generate all
sequences of a combination by "dictionary" ordering.
At first, we could treat all combinations of the range as a set of
"dictionary" sorted sequences. Then generate the current sequence
into the next (or previous) one of this set and return true if there
are more sequences to generate. If the sequence is the largest (or
smallest) one in sets, the smallest (or largest) is generated and
false returned.
Of course, they should be couple of the permutation.
Well, Ben Bear has an implementation. It works well with some
typical tests. There is a php implementation available at
http://www.bxmy.org/php/combination.php.
Example
// select two number from {0, 1, 2, 3, 4}
int a[5] = {0, 1, 2, 3, 4};
init_combination (a, a+2, a+5);
do
{
// here, the selected number is {a[0], a[1]}
}
while (next_combination (a, a+2, a+5));
II. Impact On the Standard
This proposal is almost a pure library extension. It proposes
changes to an existing header, , however it does not
require changes to any standard classes or functions and standard
requirement tables, and also changes in the core language. It has
been implemented in standard C++. The proposal does not depend on any
other library extensions.
III. Design Decisions
A. Dictionary ordering of Combinations
Sort all the sequences of the combinations by "Dictionary"
ordering, like the std::permutation. Then we can compare two
sequences, one is smaller and one is larger. If sequence B is the
smallest one that is larger than sequence A, call B is the "next"
sequence of A and "previous" can be defined like "next".
The combination generate the current sequence into the next(or
previous) one of this set. Returns true if there are more sequences
to generate. If the sequence is the largest(or smallest) of the set,
the smallest(or largest) is generated and false returned.
The combination functions can generate all sequences of
combinations one by one, from smallest to largest or from largest to
smallest.
B. Data Structure: [first, middle) and [middle, last)
Stores the selected elements in [first, middle), and the remaining
in [middle, last). The range [first, last) contains whole elements.
When generating a sequence (identified by (first, middle, last)),
some elements in [first, middle) are swapped with the equivalent
number of elements in [middle, last). This makes the sequence become
larger.
Consider that the [first, middle) is selected and the [middle,
last) is unselected, so [first, middle) matches "select m from n",
while the [middle, last) matches "select (n-m) from n".
When [first, middle) goes next, [middle, last) goes previous.
Such as {0 1}{2 3} becomes {0 2}{1 3}. {0 2} is the next one of {0
1}, while {2 3} is the next one of {1, 3}. So {0 1} goes next and {2,
3} goes previous.
So, if we know how to make [first, middle) go next, we can make
[middle, last) go next too. [middle, last) goes next, means that
[first, middle) goes previous.
Two functions was provided, next_combination and prev_combination,
just like next_permutation and prev_permutation. Those two functions
need three parameters, first, middle and last. For example:
int a[7];
...
bool finish = next_combination (a, a+3, a+7);
C. The Algorithm
We just consider the next_combination, because prev_combination
can be implemented by next_combination.
Like the next_permutation, this algorithm finds the right part of
[first, middle), that can't become larger, then increases the previous
element of this part, finally adjusts this part to the smallest. The
difference between the combination and the permutation is that the
combination has two part, [first, middle) and [middle, last).
Well, define [first, middle) to sequence P, and [middle, last) to
sequence Q. Also, define the largest element of Q to Qmax.
1. Find the right part
If one element of P isn't less than the Qmax, it can't become
larger. Else, if we replace one element in P that is less than the
Qmax, the new P is larger than old P.
So, the right part of P that can't become much larger is the
elements that are not less than the Qmax.
If all elements in P are not less than the Qmax, the P must be the
largest sequence. Adjust it to the smallest sequence, and the
algorithm is finished.
2. Increase the previous element of the largest right part
Define the previous element of this part to Pout, which will be
selected out from P.
Pout is less than Qmax, because this part contains all elements
that are not less than Qmax. So we can select an element from Q,
which is the smallest element that is greater than Pout. Define this
element to Qin.
Replace Pout with Qin, and then go to the next step.
3. Adjust the largest right part to smallest
Fill the part with the smallest elements that are not less than
Qin. The algorithm is finished.
This algorithm works good even if there are repeated elements.
D. Performance
1. Initialization
Two initialization functions will call merge sort to initializing
the combination ranges. They cost time O(n*log(n)) and space O(n).
2. Combination
Because the combination has three step. The first step can be
implemented using lower_bound and upper_bound, that cost time
O(log(n)) and space O(1). The second step cost O(1) in time and
space. The last step is merge, which cost O(n) in time and space.
So the combination cost O(n) in time and space. But the averages
time cost is less than O(n), because the third step may isn't needed.
Test next_combination(a, a+k, a+n) on IBM ThinkPad(P3M 1G). a is
defined as int a[35], and it's initial value is 1, 2, 3, ..., n. The
results are as follows:
k n time cost(ms) : times : avg time(ns)
0 : 7 : 1121 : 30000000 : 37
1 : 7 : 18637 : 30000000 : 621
2 : 7 : 23534 : 30000000 : 784
3 : 7 : 25837 : 30000000 : 861
4 : 7 : 25717 : 30000000 : 857
5 : 7 : 24645 : 30000000 : 822
6 : 7 : 18898 : 30000000 : 630
7 : 7 : 1161 : 30000000 : 39
3 : 14 : 24816 : 30000000 : 827
3 : 21 : 24034 : 30000000 : 801
3 : 28 : 26048 : 30000000 : 868
3 : 35 : 23974 : 30000000 : 799
6 : 14 : 27500 : 30000000 : 917
9 : 21 : 29832 : 30000000 : 994
12 : 28 : 30715 : 30000000 : 1024
15 : 35 : 31475 : 30000000 : 1049
4 : 14 : 26038 : 30000971 : 868
6 : 21 : 27249 : 30007992 : 908
8 : 28 : 30383 : 31081050 : 978
10 : 35 : 186939 : 183579396 : 1018
10 : 14 : 27941 : 30000971 : 931
15 : 21 : 28891 : 30007992 : 963
20 : 28 : 30424 : 31081050 : 979
Well, it seems very stable.
30003120 times of calling next_permutation(a, a+7) take 4997ms,
that averages 167ns per time. Combination is several times slow than
permutation.
E. Alternatives
1. Combination by Counting
We can implement this algorithm by counting. Create a count
sequence that contains the count of each element, and a number used
sequence that contains the number used of each element. Changing the
used sequence equates changing the sequence of combination. It need a
transformation from the number used sequence into the elements
sequence.
If each element is different from the others, the count sequence
can be left out and the number used sequence can become bool sequence.
In this method, we don't care the elements self, but the count and
number used of the elements.
2. The interface of the combination
1) How to initialize?
Because the combination needs the ranges, [first, middle) and
[middle, last), to be sorted, we must provide an interface for this.
Ben Bear's implementation provides two functions,
init_combination() and adjust_combination(). init_combination() sets
a combination to the smallest or largest sequence.
adjust_combination() sets a combination to obeying the rule. These
functions use the merge sort.
Also, these functions can be left, but the users must initialize
the combination themselves. They must sort the ranges, then reverse
the ranges if need. It's not a good way.
2) Publish the separate_merge()?
The separate_merge() is a little different from the
std::inplace_merge. The first needs two separate ranges while the
second needs a series of two ranges. Well, the inplace_merge can be
easy implemented by calling the separate_merge.
Is it needed to publish the separate_merge(maybe change the name)
or publish it as an overload of inplace_merge?
3) Publish the combination()?
There are three way, publishing:
a) combination(), next_combination(), prev_combination();
b) next_combination(), prev_combination();
c) combination()
IV. Proposed Text
A. Additions to header synopsis
template
void
separate_merge (BiIter1 first1, BiIter1 last1,
BiIter2 first2, BiIter2 last2);
template
void
init_combination (BiIter first, BiIter middle, BiIter last,
bool beginning = true);
template
void
adjust_combination (BiIter first, BiIter middle, BiIter last);
template
bool
combination (BiIter1 first1, BiIter1 last1,
BiIter2 first2, BiIter2 last2);
template
bool
next_combination (BiIter first, BiIter middle, BiIter last);
template
bool
prev_combination (BiIter first, BiIter middle, BiIter last);
B. Separate merge function
template
void
separate_merge (BiIter1 first1, BiIter1 last1,
BiIter2 first2, BiIter2 last2);
Requirements: BiIter should be a Bidirectional Iterator. [first,
middle) and [middle, last) should be available ranges
and sorted.
Effects: Merge two sorted and consecutive ranges, [first1, last1) and
[first2, last2), and puts the first (last1-first1) elements
in [first1, last1) while the remaining elements in [first2,
last2). The output will be sorted. The sort is stable, that
is, for equivalent elements in the two ranges, elements from
the first range will always come before elements from the
second.
Post conditions: The largest elements should be in [middle, last);
else in [first, middle). [first, middle) and
[middle, last) are both sorted.
Throws: nothing
C. Initialize
template
void
init_combination (BiIter first, BiIter middle, BiIter last,
bool beginning = true);
Requirements: BiIter should be a Bidirectional Iterator. [first,
middle) and [middle, last) should be available ranges.
Effects: If beginning is true, make the COMBI(first, middle, last) has
the min value; else, make the COMBI(first, middle, last) has
max value.
Post conditions: If beginning, the greatest elements should be in
[middle, last); else in [first, middle). [first,
middle) and [middle, last) are both sorted.
Throws: nothing
template
void
adjust_combination (BiIter first, BiIter middle, BiIter last);
Requirements: BiIter should be a Bidirectional Iterator. [first,
middle) and [middle, last) should be available ranges.
Effects: Sort [first, middle) and [middle, last).
Post conditions: [first, middle) and [middle, last) are both sorted.
Throws: nothing
D. Combine
template
bool
combination (BiIter1 first1, BiIter1 last1,
BiIter2 first2, BiIter2 last2);
Requirements: BiIter should be a Bidirectional Iterator. [first,
middle) and [middle, last) should be available range and sorted.
Effects: Combine the input sequence into next one of the "dictionary"
sorted combinations. If the input sequence is the largest
one, the smallest sequence is generated. Where [first1,
last1) are the elements selected in and [first2, last2) are
the elements selected out.
Return: True if the input sequence is not the smallest one. Otherwise
returns false.
Post conditions: [first, middle) and [middle, last) are both sorted.
Throws: nothing
template
bool
next_combination (BiIter first, BiIter middle, BiIter last);
Requirements: BiIter should be Bidirectional Iterator. [first, middle)
and [middle, last) should be available ranges and
sorted.
Effects: Combine the input sequence into next one of the "dictionary"
sorted combinations. If the input sequence is the largest
one, the smallest sequence is generated. Where [first1,
last1) are the elements selected in and [first2, last2) are
the elements selected out.
Return: True if the input sequence is not the smallest one. Otherwise
returns false.
Post conditions: [first, middle) and [middle, last) are both sorted.
Throws: nothing
template
bool
prev_combination (BiIter first, BiIter middle, BiIter last);
Requirements: BiIter should be a Bidirectional Iterator. [first,
middle) and [middle, last) should be available ranges
and sorted.
Effects: Combine the input sequence into next one of the "dictionary"
sorted combinations. If the input sequence is the largest
one, the smallest sequence is generated. Where [first1,
last1) are the elements selected in and [first2, last2) are
the elements selected out.
Return: True if the input sequence is not the smallest one. Otherwise
returns false.
Post conditions: [first, middle) and [middle, last) are both sorted.
Throws: nothing
V. An implementation
An implementation can be found at
http://www.bxmy.org/code/combination.hpp.
There is the main source code.
------------------------------------------------
template
bool
combination (Iter first1, Iter last1, Iter first2, Iter last2)
{
if ((first1 == last1) || (first2 == last2))
return false;
Iter qmax = last2;
--qmax;
Iter pout1 = std::lower_bound(first1, last1, *qmax);
bool fin = pout1 == first1;
Iter left1, left2;
if (!fin)
{
Iter pout = pout1;
--pout;
Iter qin = std::upper_bound(first2, last2, *pout);
std::iter_swap (pout, qin);
left1 = pout;
++left1;
left2 = qin;
++left2;
}
else
{
left1 = first1;
left2 = first2;
}
separate_merge (left1, last1, left2, last2);
return !fin;
}
template
inline bool
next_combination (Iter first, Iter middle, Iter last)
{
return combination (first, middle, middle, last);
}
template
inline bool
prev_combination (Iter first, Iter middle, Iter last)
{
return combination (middle, last, first, middle);
}
------------------------------------------------
VI. Revision History
VII. References
Ben Bear, Natural Combination Algorithm, April 2006. Available online
at http://www.bxmy.org/article/combination.pdf.