A Proposal to Add Combination Against Permutation to the Library Technical Report I. Motivation The motivation of this article is to provide an ability which can generate combination like std::permutation. It can generate all sequences of a combination by "dictionary" ordering. At first, we could treat all combinations of the range as a set of "dictionary" sorted sequences. Then generate the current sequence into the next (or previous) one of this set and return true if there are more sequences to generate. If the sequence is the largest (or smallest) one in sets, the smallest (or largest) is generated and false returned. Of course, they should be couple of the permutation. Well, Ben Bear has an implementation. It works well with some typical tests. There is a php implementation available at http://www.bxmy.org/php/combination.php. Example // select two number from {0, 1, 2, 3, 4} int a[5] = {0, 1, 2, 3, 4}; init_combination (a, a+2, a+5); do { // here, the selected number is {a[0], a[1]} } while (next_combination (a, a+2, a+5)); II. Impact On the Standard This proposal is almost a pure library extension. It proposes changes to an existing header, , however it does not require changes to any standard classes or functions and standard requirement tables, and also changes in the core language. It has been implemented in standard C++. The proposal does not depend on any other library extensions. III. Design Decisions A. Dictionary ordering of Combinations Sort all the sequences of the combinations by "Dictionary" ordering, like the std::permutation. Then we can compare two sequences, one is smaller and one is larger. If sequence B is the smallest one that is larger than sequence A, call B is the "next" sequence of A and "previous" can be defined like "next". The combination generate the current sequence into the next(or previous) one of this set. Returns true if there are more sequences to generate. If the sequence is the largest(or smallest) of the set, the smallest(or largest) is generated and false returned. The combination functions can generate all sequences of combinations one by one, from smallest to largest or from largest to smallest. B. Data Structure: [first, middle) and [middle, last) Stores the selected elements in [first, middle), and the remaining in [middle, last). The range [first, last) contains whole elements. When generating a sequence (identified by (first, middle, last)), some elements in [first, middle) are swapped with the equivalent number of elements in [middle, last). This makes the sequence become larger. Consider that the [first, middle) is selected and the [middle, last) is unselected, so [first, middle) matches "select m from n", while the [middle, last) matches "select (n-m) from n". When [first, middle) goes next, [middle, last) goes previous. Such as {0 1}{2 3} becomes {0 2}{1 3}. {0 2} is the next one of {0 1}, while {2 3} is the next one of {1, 3}. So {0 1} goes next and {2, 3} goes previous. So, if we know how to make [first, middle) go next, we can make [middle, last) go next too. [middle, last) goes next, means that [first, middle) goes previous. Two functions was provided, next_combination and prev_combination, just like next_permutation and prev_permutation. Those two functions need three parameters, first, middle and last. For example: int a[7]; ... bool finish = next_combination (a, a+3, a+7); C. The Algorithm We just consider the next_combination, because prev_combination can be implemented by next_combination. Like the next_permutation, this algorithm finds the right part of [first, middle), that can't become larger, then increases the previous element of this part, finally adjusts this part to the smallest. The difference between the combination and the permutation is that the combination has two part, [first, middle) and [middle, last). Well, define [first, middle) to sequence P, and [middle, last) to sequence Q. Also, define the largest element of Q to Qmax. 1. Find the right part If one element of P isn't less than the Qmax, it can't become larger. Else, if we replace one element in P that is less than the Qmax, the new P is larger than old P. So, the right part of P that can't become much larger is the elements that are not less than the Qmax. If all elements in P are not less than the Qmax, the P must be the largest sequence. Adjust it to the smallest sequence, and the algorithm is finished. 2. Increase the previous element of the largest right part Define the previous element of this part to Pout, which will be selected out from P. Pout is less than Qmax, because this part contains all elements that are not less than Qmax. So we can select an element from Q, which is the smallest element that is greater than Pout. Define this element to Qin. Replace Pout with Qin, and then go to the next step. 3. Adjust the largest right part to smallest Fill the part with the smallest elements that are not less than Qin. The algorithm is finished. This algorithm works good even if there are repeated elements. D. Performance 1. Initialization Two initialization functions will call merge sort to initializing the combination ranges. They cost time O(n*log(n)) and space O(n). 2. Combination Because the combination has three step. The first step can be implemented using lower_bound and upper_bound, that cost time O(log(n)) and space O(1). The second step cost O(1) in time and space. The last step is merge, which cost O(n) in time and space. So the combination cost O(n) in time and space. But the averages time cost is less than O(n), because the third step may isn't needed. Test next_combination(a, a+k, a+n) on IBM ThinkPad(P3M 1G). a is defined as int a[35], and it's initial value is 1, 2, 3, ..., n. The results are as follows: k n time cost(ms) : times : avg time(ns) 0 : 7 : 1121 : 30000000 : 37 1 : 7 : 18637 : 30000000 : 621 2 : 7 : 23534 : 30000000 : 784 3 : 7 : 25837 : 30000000 : 861 4 : 7 : 25717 : 30000000 : 857 5 : 7 : 24645 : 30000000 : 822 6 : 7 : 18898 : 30000000 : 630 7 : 7 : 1161 : 30000000 : 39 3 : 14 : 24816 : 30000000 : 827 3 : 21 : 24034 : 30000000 : 801 3 : 28 : 26048 : 30000000 : 868 3 : 35 : 23974 : 30000000 : 799 6 : 14 : 27500 : 30000000 : 917 9 : 21 : 29832 : 30000000 : 994 12 : 28 : 30715 : 30000000 : 1024 15 : 35 : 31475 : 30000000 : 1049 4 : 14 : 26038 : 30000971 : 868 6 : 21 : 27249 : 30007992 : 908 8 : 28 : 30383 : 31081050 : 978 10 : 35 : 186939 : 183579396 : 1018 10 : 14 : 27941 : 30000971 : 931 15 : 21 : 28891 : 30007992 : 963 20 : 28 : 30424 : 31081050 : 979 Well, it seems very stable. 30003120 times of calling next_permutation(a, a+7) take 4997ms, that averages 167ns per time. Combination is several times slow than permutation. E. Alternatives 1. Combination by Counting We can implement this algorithm by counting. Create a count sequence that contains the count of each element, and a number used sequence that contains the number used of each element. Changing the used sequence equates changing the sequence of combination. It need a transformation from the number used sequence into the elements sequence. If each element is different from the others, the count sequence can be left out and the number used sequence can become bool sequence. In this method, we don't care the elements self, but the count and number used of the elements. 2. The interface of the combination 1) How to initialize? Because the combination needs the ranges, [first, middle) and [middle, last), to be sorted, we must provide an interface for this. Ben Bear's implementation provides two functions, init_combination() and adjust_combination(). init_combination() sets a combination to the smallest or largest sequence. adjust_combination() sets a combination to obeying the rule. These functions use the merge sort. Also, these functions can be left, but the users must initialize the combination themselves. They must sort the ranges, then reverse the ranges if need. It's not a good way. 2) Publish the separate_merge()? The separate_merge() is a little different from the std::inplace_merge. The first needs two separate ranges while the second needs a series of two ranges. Well, the inplace_merge can be easy implemented by calling the separate_merge. Is it needed to publish the separate_merge(maybe change the name) or publish it as an overload of inplace_merge? 3) Publish the combination()? There are three way, publishing: a) combination(), next_combination(), prev_combination(); b) next_combination(), prev_combination(); c) combination() IV. Proposed Text A. Additions to header synopsis template void separate_merge (BiIter1 first1, BiIter1 last1, BiIter2 first2, BiIter2 last2); template void init_combination (BiIter first, BiIter middle, BiIter last, bool beginning = true); template void adjust_combination (BiIter first, BiIter middle, BiIter last); template bool combination (BiIter1 first1, BiIter1 last1, BiIter2 first2, BiIter2 last2); template bool next_combination (BiIter first, BiIter middle, BiIter last); template bool prev_combination (BiIter first, BiIter middle, BiIter last); B. Separate merge function template void separate_merge (BiIter1 first1, BiIter1 last1, BiIter2 first2, BiIter2 last2); Requirements: BiIter should be a Bidirectional Iterator. [first, middle) and [middle, last) should be available ranges and sorted. Effects: Merge two sorted and consecutive ranges, [first1, last1) and [first2, last2), and puts the first (last1-first1) elements in [first1, last1) while the remaining elements in [first2, last2). The output will be sorted. The sort is stable, that is, for equivalent elements in the two ranges, elements from the first range will always come before elements from the second. Post conditions: The largest elements should be in [middle, last); else in [first, middle). [first, middle) and [middle, last) are both sorted. Throws: nothing C. Initialize template void init_combination (BiIter first, BiIter middle, BiIter last, bool beginning = true); Requirements: BiIter should be a Bidirectional Iterator. [first, middle) and [middle, last) should be available ranges. Effects: If beginning is true, make the COMBI(first, middle, last) has the min value; else, make the COMBI(first, middle, last) has max value. Post conditions: If beginning, the greatest elements should be in [middle, last); else in [first, middle). [first, middle) and [middle, last) are both sorted. Throws: nothing template void adjust_combination (BiIter first, BiIter middle, BiIter last); Requirements: BiIter should be a Bidirectional Iterator. [first, middle) and [middle, last) should be available ranges. Effects: Sort [first, middle) and [middle, last). Post conditions: [first, middle) and [middle, last) are both sorted. Throws: nothing D. Combine template bool combination (BiIter1 first1, BiIter1 last1, BiIter2 first2, BiIter2 last2); Requirements: BiIter should be a Bidirectional Iterator. [first, middle) and [middle, last) should be available range and sorted. Effects: Combine the input sequence into next one of the "dictionary" sorted combinations. If the input sequence is the largest one, the smallest sequence is generated. Where [first1, last1) are the elements selected in and [first2, last2) are the elements selected out. Return: True if the input sequence is not the smallest one. Otherwise returns false. Post conditions: [first, middle) and [middle, last) are both sorted. Throws: nothing template bool next_combination (BiIter first, BiIter middle, BiIter last); Requirements: BiIter should be Bidirectional Iterator. [first, middle) and [middle, last) should be available ranges and sorted. Effects: Combine the input sequence into next one of the "dictionary" sorted combinations. If the input sequence is the largest one, the smallest sequence is generated. Where [first1, last1) are the elements selected in and [first2, last2) are the elements selected out. Return: True if the input sequence is not the smallest one. Otherwise returns false. Post conditions: [first, middle) and [middle, last) are both sorted. Throws: nothing template bool prev_combination (BiIter first, BiIter middle, BiIter last); Requirements: BiIter should be a Bidirectional Iterator. [first, middle) and [middle, last) should be available ranges and sorted. Effects: Combine the input sequence into next one of the "dictionary" sorted combinations. If the input sequence is the largest one, the smallest sequence is generated. Where [first1, last1) are the elements selected in and [first2, last2) are the elements selected out. Return: True if the input sequence is not the smallest one. Otherwise returns false. Post conditions: [first, middle) and [middle, last) are both sorted. Throws: nothing V. An implementation An implementation can be found at http://www.bxmy.org/code/combination.hpp. There is the main source code. ------------------------------------------------ template bool combination (Iter first1, Iter last1, Iter first2, Iter last2) { if ((first1 == last1) || (first2 == last2)) return false; Iter qmax = last2; --qmax; Iter pout1 = std::lower_bound(first1, last1, *qmax); bool fin = pout1 == first1; Iter left1, left2; if (!fin) { Iter pout = pout1; --pout; Iter qin = std::upper_bound(first2, last2, *pout); std::iter_swap (pout, qin); left1 = pout; ++left1; left2 = qin; ++left2; } else { left1 = first1; left2 = first2; } separate_merge (left1, last1, left2, last2); return !fin; } template inline bool next_combination (Iter first, Iter middle, Iter last) { return combination (first, middle, middle, last); } template inline bool prev_combination (Iter first, Iter middle, Iter last) { return combination (middle, last, first, middle); } ------------------------------------------------ VI. Revision History VII. References Ben Bear, Natural Combination Algorithm, April 2006. Available online at http://www.bxmy.org/article/combination.pdf.