|
Boost : |
From: Mat Marcus (mat-lists_at_[hidden])
Date: 2008-04-11 18:07:50
###
Overview
###
In a recent thread regarding issues in the boost::System library, we
began a discussion on the performance cost, *in release builds*, of
accepting the default Microsoft compiler setting that enables the
so-called secure STL feature. In this post I am adding some source
code excerpts, from a colleague's benchmark suite, that I used to
generate the results (reproduced below). I am also changing the
subject line to better reflect the topic under discussion.
Summary: In VC8 (Dev Studio 2005) and later, Microsoft's 'secure' STL
feature (enabled by default) can cause a slowdown of more than an
order of magnitude in the execution of performance critical algorithms
in release builds. I suspect that many projects would benefit from
ensuring that _SECURE_SCL=0 for release configurations. I recommend
that boost ships with this feature disabled by default for (at least)
release builds.
Microsoft product managers claim that the performance cost for release
builds is negligible. For example, Peter Dimov cited a thread that
lead me to the following claim:
Martyn Lovell> I strongly recommend against turning off
Martyn Lovell> _SECURE_SCL. The perf win is normally minor, and
Martyn Lovell> the security risk is much higher. If you find a
Martyn Lovell> major perf issue with _SECURE_SCL, please let us
Martyn Lovell> know.
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=863481&SiteID=1#882374
The benchmarks below offer a different view of the performance costs.
(Please don't confuse this discussion with the separate feature
controlling compiler warnings. In some configurations, the MS compiler
warning that routines like std::copy are inherently unsafe. The
original text of the warnings claimed that such routines had been
"deprecated". We have not deprecated such routines in the C++
committee. The _SCL_SECURE_NO_DEPRECATE can be used to control such
warnings, but this issue is separate from the performance impact of
the _SECURE_SCL feature.)
A colleague has been developing some benchmarks that aim to help
compiler vendors ensure that they properly support C++ programming,
and to help programmers understand the costs of the algorithms and
data structures that they use. Instead of using his suite to measure
the relative performance of different compilers, I ran some tests
completely under VC 9 to measure only the Release build performance of
the 'secure' STL. (Interestingly, the penalty seems to have increased
from VC8 to VC9. But the numbers for VC8 are also disheartening.)
###
Details
###
I ran the stepanov_vector benchmark (excerpted below) twice, once with
default boost build settings ("secure" STL enabled) and once with it
turned off. That is, I built with settings roughly equivalent to:
bjam release
bjam release define=_SCL_SECURE=0
This benchmark attempts, amongst other things, to measure the
abstraction penalty by comparing the performance of different
algorithms when using a vendor's vector with associated iterator type
vs. an array with pointers. When reading the results below, 'double
pointer' denotes a test using an array of doubles (pointer iterators),
where 'double vector' means a vector of doubles
(vector::iterator). Note that sometimes the penalty exceeds an
order of magnitude . In light of these results, one could argue
that_SECURE_SCL for release builds is a misfeature. I won't debate
whether it is worth the slowdown for normal (non-extreme) debugging
situations. Results follow for 'secure' STL and then normal STL. These
are followed by a source code excerpt from the pre-release benchmark
suite.
As Peter pointed out, SECURE_SCL setting must be consistent throughout
linkage units, or else the resulting ODR violations can lead to memory
corruption etc. For this reason, I would like to see boost build add a
link-incompatible feature, something like:
on/off :-)
whose default setting is off for release. Of course something
like might be a more realistic name. Of course one
can manually achieve a similar effect using the bjam command line
above. But with such an approach it is too easy to mix and match, the
defaults are backwards, and there seems to be relatively little
support for projects built with boost build that bring in boost
libraries via properties. Alternatively, it might be worth
considering whether we could control the feature via configuration
header files.
- Mat
###
release build with _SECURE_SCL enabled
###
c:\boxer\performance\projects\msvc8\Release\stepanov_vector.exe
test description absolute operations ratio with
number time per second test0
0 "double pointer verify2" 3.03 sec 989.77 M 1.00
1 "DoubleClass pointer verify2" 3.05 sec 984.57 M 1.01
2 "double vector" 16.14 sec 185.86 M 5.33
3 "DoubleClass vector" 18.19 sec 164.95 M 6.00
4 "double vector reverse" 26.23 sec 114.35 M 8.66
5 "DoubleClass vector reverse" 26.20 sec 114.49 M 8.65
6 "double vector reverse reverse" 31.25 sec 96.00 M 10.31
7 "DoubleClass vector reverse reverse" 31.25 sec 96.00 M 10.31
Total absolute time for Vector accumulate: 155.34 sec
Vector accumulate Penalty: 5.92
test description absolute operations
ratio with
number time per second
test0
0 "insertion_sort double pointer verify2" 1.16 sec 2.60 M
1.00
1 "insertion_sort DoubleClass pointer verify2" 1.58 sec 1.90 M
1.37
2 "insertion_sort double vector" 19.58 sec 0.15 M
16.94
3 "insertion_sort DoubleClass vector" 18.00 sec 0.17 M
15.57
4 "insertion_sort double vector reverse" 30.56 sec 0.10 M
26.44
5 "insertion_sort DoubleClass vector reverse" 29.06 sec 0.10 M
25.14
6 "insertion_sort double vector reverse reverse" 43.03 sec 0.07 M
37.22
7 "insertion_sort DoubleClass vector reverse reverse" 43.05 sec 0.07 M
37.24
Total absolute time for Vector Insertion Sort: 186.02 sec
Vector Insertion Sort Penalty: 16.49
test description absolute operations rati
o with
number time per second test
0
0 "quicksort double pointer verify2" 1.64 sec 14.63 M 1.00
1 "quicksort DoubleClass pointer verify2" 1.95 sec 12.29 M 1.19
2 "quicksort double vector" 5.03 sec 4.77 M 3.07
3 "quicksort DoubleClass vector" 5.22 sec 4.60 M 3.18
4 "quicksort double vector reverse" 7.16 sec 3.35 M 4.36
5 "quicksort DoubleClass vector reverse" 7.34 sec 3.27 M 4.48
6 "quicksort double vector reverse reverse" 8.84 sec 2.71 M 5.39
7 "quicksort DoubleClass vector reverse reverse" 9.06 sec 2.65 M 5.52
Total absolute time for Vector Quicksort: 46.25 sec
Vector Quicksort Penalty: 3.52
test description absolute operations rati
o with
number time per second test
0
0 "heap_sort double pointer verify2" 1.81 sec 13.24 M 1.00
1 "heap_sort DoubleClass pointer verify2" 1.88 sec 12.80 M 1.03
2 "heap_sort double vector" 7.84 sec 3.06 M 4.33
3 "heap_sort DoubleClass vector" 7.89 sec 3.04 M 4.35
4 "heap_sort double vector reverse" 10.80 sec 2.22 M 5.96
5 "heap_sort DoubleClass vector reverse" 10.31 sec 2.33 M 5.69
6 "heap_sort double vector reverse reverse" 13.23 sec 1.81 M 7.30
7 "heap_sort DoubleClass vector reverse reverse" 13.22 sec 1.82 M 7.29
Total absolute time for Vector Heap Sort: 66.98 sec
Vector Heap Sort Penalty: 4.46
###
release build with _SECURE_SCL disabled
###
c:\boxer\performance\projects\msvc8\Release\stepanov_vector.exe
test description absolute operations ratio with
number time per second test0
0 "double pointer verify2" 3.03 sec 989.45 M 1.00
1 "DoubleClass pointer verify2" 3.05 sec 984.57 M 1.00
2 "double vector" 3.06 sec 979.75 M 1.01
3 "DoubleClass vector" 3.05 sec 984.57 M 1.00
4 "double vector reverse" 3.06 sec 979.75 M 1.01
5 "DoubleClass vector reverse" 3.05 sec 984.57 M 1.00
6 "double vector reverse reverse" 3.09 sec 969.62 M 1.02
7 "DoubleClass vector reverse reverse" 3.06 sec 979.43 M 1.01
Total absolute time for Vector accumulate: 24.45 sec
Vector accumulate Penalty: 1.01
test description absolute operations
ratio with
number time per second
test0
0 "insertion_sort double pointer verify2" 1.16 sec 2.60 M
1.00
1 "insertion_sort DoubleClass pointer verify2" 1.58 sec 1.90 M
1.37
2 "insertion_sort double vector" 1.58 sec 1.90 M
1.37
3 "insertion_sort DoubleClass vector" 2.08 sec 1.44 M
1.80
4 "insertion_sort double vector reverse" 1.55 sec 1.94 M
1.34
5 "insertion_sort DoubleClass vector reverse" 2.03 sec 1.48 M
1.76
6 "insertion_sort double vector reverse reverse" 1.59 sec 1.88 M
1.38
7 "insertion_sort DoubleClass vector reverse reverse" 2.09 sec 1.43 M
1.81
Total absolute time for Vector Insertion Sort: 13.66 sec
Vector Insertion Sort Penalty: 1.53
test description absolute operations rati
o with
number time per second test
0
0 "quicksort double pointer verify2" 1.75 sec 13.71 M 1.00
1 "quicksort DoubleClass pointer verify2" 2.06 sec 11.64 M 1.18
2 "quicksort double vector" 1.80 sec 13.36 M 1.03
3 "quicksort DoubleClass vector" 2.14 sec 11.21 M 1.22
4 "quicksort double vector reverse" 1.78 sec 13.48 M 1.02
5 "quicksort DoubleClass vector reverse" 2.13 sec 11.29 M 1.21
6 "quicksort double vector reverse reverse" 1.78 sec 13.48 M 1.02
7 "quicksort DoubleClass vector reverse reverse" 2.16 sec 11.13 M 1.23
Total absolute time for Vector Quicksort: 15.59 sec
Vector Quicksort Penalty: 1.13
test description absolute operations rati
o with
number time per second test
0
0 "heap_sort double pointer verify2" 1.80 sec 13.36 M 1.00
1 "heap_sort DoubleClass pointer verify2" 1.83 sec 13.12 M 1.02
2 "heap_sort double vector" 1.88 sec 12.80 M 1.04
3 "heap_sort DoubleClass vector" 2.05 sec 11.73 M 1.14
4 "heap_sort double vector reverse" 2.14 sec 11.21 M 1.19
5 "heap_sort DoubleClass vector reverse" 2.28 sec 10.52 M 1.27
6 "heap_sort double vector reverse reverse" 1.88 sec 12.80 M 1.04
7 "heap_sort DoubleClass vector reverse reverse" 2.03 sec 11.81 M 1.13
Total absolute time for Vector Heap Sort: 15.88 sec
Vector Heap Sort Penalty: 1.12
###
pre-release benchmark source code excerpt
###
/*
Copyright 2007-2008 Adobe Systems Incorporated
Distributed under the MIT License (see accompanying file LICENSE_1_0_0.txt
or a copy at http://stlab.adobe.com/licenses.html)
Goal: examine any change in performance when moving from pointers to vector iterators
Assumptions:
1) Vector iterators should not perform worse than raw pointers.
2) Using a vector should not perform worse than using a plain array of values.
HIstory:
This is similar to Alex Stepanov's original abstraction penalty benchmark.
But the use of the STL container adds complexity and dependencies that don't
fit with the design of the original benchmark, so this is broken out
into a separate test. Now we are testing the STL vector implementation
in addition to the compiler.
#include
#include
#include
#include
#include
#include
#include "benchmark_results.h"
#include "benchmark_timer.h"
#include "benchmark_algorithms.h"
#include
/******************************************************************************/
// a single double value wrapped in a struct
struct DoubleClass {
double value;
DoubleClass() {}
DoubleClass(const double& x) : value(x) {}
operator double() { return value; }
};
inline DoubleClass operator+(const DoubleClass& x, const DoubleClass& y) {
return DoubleClass(x.value + y.value);
}
inline bool operator<(const DoubleClass& x, const DoubleClass& y) {
return (x.value < y.value);
}
/******************************************************************************/
// reverse iterator template
template
struct reverse_iterator {
RandomAccessIterator current;
reverse_iterator(RandomAccessIterator x) : current(x) {}
T& operator*() const {
RandomAccessIterator tmp = current;
return *(--tmp);
}
};
// really a distance between pointers, which must return ptrdiff_t
// because (ptr - ptr) --> ptrdiff_t
template
inline ptrdiff_t operator-(reverse_iterator& xx, reverse_iterator& yy) {
return (ptrdiff_t)( yy.current - xx.current );
}
template
inline reverse_iterator& operator++(reverse_iterator &xx) {
--xx.current;
return xx;
}
template
inline reverse_iterator& operator--(reverse_iterator &xx) {
++xx.current;
return xx;
}
template
inline reverse_iterator operator++(reverse_iterator &xx, int) {
reverse_iterator tmp = xx;
++xx;
return tmp;
}
template
inline reverse_iterator operator--(reverse_iterator &xx, int) {
reverse_iterator tmp = xx;
--xx;
return tmp;
}
template
inline reverse_iterator operator-(reverse_iterator &xx, ptrdiff_t inc) {
reverse_iterator tmp = xx;
tmp.current += inc;
return tmp;
}
template
inline reverse_iterator operator+(reverse_iterator &xx, ptrdiff_t inc) {
reverse_iterator tmp = xx;
tmp.current -= inc;
return tmp;
}
template
inline reverse_iterator& operator+=(reverse_iterator &xx, ptrdiff_t inc) {
xx.current -= inc;
return xx;
}
template
inline reverse_iterator& operator-=(reverse_iterator &xx, ptrdiff_t inc) {
xx.current += inc;
return xx;
}
template
inline bool operator<(const reverse_iterator& x, const reverse_iterator& y) {
return (y.current < x.current);
}
template
inline bool operator==(const reverse_iterator& x, const reverse_iterator& y) {
return (x.current == y.current);
}
template
inline bool operator!=(const reverse_iterator& x, const reverse_iterator& y) {
return (x.current != y.current);
}
/******************************************************************************/
/******************************************************************************/
// this constant may need to be adjusted to give reasonable minimum times
// For best results, times should be about 1.0 seconds for the minimum test run
int iterations = 1500000;
// 2000 items, or about 16k of data
// this is intended to remain within the L2 cache of most common CPUs
#define SIZE 2000
// initial value for filling our arrays, may be changed from the command line
double init_value = 3.0;
/******************************************************************************/
/******************************************************************************/
inline void check_sum(double result) {
if (result != SIZE * init_value) printf("test %i failed\n", current_test);
}
/******************************************************************************/
template
void verify_sorted(Iterator first, Iterator last) {
if (!is_sorted(first,last))
printf("sort test %i failed\n", current_test);
}
/******************************************************************************/
// a template using the accumulate template and iterators
template
void test(Iterator first, Iterator last, T zero, const char *label) {
int i;
start_timer();
for(i = 0; i < iterations; ++i)
check_sum( double( accumulate(first, last, zero) ) );
record_result( timer(), label );
}
/******************************************************************************/
template
void test_insertion_sort(Iterator firstSource, Iterator lastSource, Iterator firstDest, Iterator lastDest, T zero, const char *label) {
int i;
start_timer();
for(i = 0; i < iterations; ++i) {
::copy(firstSource, lastSource, firstDest);
insertionSort< Iterator, T>( firstDest, lastDest );
verify_sorted( firstDest, lastDest );
}
record_result( timer(), label );
}
/******************************************************************************/
template
void test_quicksort(Iterator firstSource, Iterator lastSource, Iterator firstDest, Iterator lastDest, T zero, const char *label) {
int i;
start_timer();
for(i = 0; i < iterations; ++i) {
::copy(firstSource, lastSource, firstDest);
quicksort< Iterator, T>( firstDest, lastDest );
verify_sorted( firstDest, lastDest );
}
record_result( timer(), label );
}
/******************************************************************************/
template
void test_heap_sort(Iterator firstSource, Iterator lastSource, Iterator firstDest, Iterator lastDest, T zero, const char *label) {
int i;
start_timer();
for(i = 0; i < iterations; ++i) {
::copy(firstSource, lastSource, firstDest);
heapsort< Iterator, T>( firstDest, lastDest );
verify_sorted( firstDest, lastDest );
}
record_result( timer(), label );
}
/******************************************************************************/
/******************************************************************************/
// our global arrays of numbers to be summed
double data[SIZE];
DoubleClass Data[SIZE];
double dataMaster[SIZE];
DoubleClass DataMaster[SIZE];
/******************************************************************************/
// declaration of our iterator types and begin/end pairs
typedef double* dp;
dp dpb = data;
dp dpe = data + SIZE;
dp dMpb = dataMaster;
dp dMpe = dataMaster + SIZE;
typedef DoubleClass* Dp;
Dp Dpb = Data;
Dp Dpe = Data + SIZE;
Dp DMpb = DataMaster;
Dp DMpe = DataMaster + SIZE;
typedef std::vector::iterator vdp;
typedef std::vector::iterator vDp;
typedef std::vector::reverse_iterator rvdp;
typedef std::vector::reverse_iterator rvDp;
typedef reverse_iterator rrvdp;
typedef reverse_iterator rrvDp;
/******************************************************************************/
/******************************************************************************/
int main(int argc, char** argv) {
double dZero = 0.0;
DoubleClass DZero = 0.0;
// output command for documentation:
int i;
for (i = 0; i < argc; ++i)
printf("%s ", argv[i] );
printf("\n");
if (argc > 1) iterations = atoi(argv[1]);
if (argc > 2) init_value = (double) atof(argv[2]);
// seed the random number generator so we get repeatable results
srand( (int)init_value + 123 );
::fill(dpb, dpe, double(init_value));
::fill(Dpb, Dpe, DoubleClass(init_value));
std::vector vec_data;
std::vector vec_Data;
vec_data.resize(SIZE);
vec_Data.resize(SIZE);
rrvdp rvdpb(vec_data.rend());
rrvdp rvdpe(vec_data.rbegin());
rrvDp rvDpb(vec_Data.rend());
rrvDp rvDpe(vec_Data.rbegin());
::fill(vec_data.begin(), vec_data.end(), double(init_value));
::fill(vec_Data.begin(), vec_Data.end(), DoubleClass(init_value));
test(dpb, dpe, dZero, "double pointer verify2");
test(Dpb, Dpe, DZero, "DoubleClass pointer verify2");
test(vec_data.begin(), vec_data.end(), dZero, "double vector");
test(vec_Data.begin(), vec_Data.end(), DZero, "DoubleClass vector");
test(vec_data.rbegin(), vec_data.rend(), dZero, "double vector reverse");
test(vec_Data.rbegin(), vec_Data.rend(), DZero, "DoubleClass vector reverse");
test(rvdpb, rvdpe, dZero, "double vector reverse reverse");
test(rvDpb, rvDpe, DZero, "DoubleClass vector reverse reverse");
summarize("Vector accumulate", SIZE, iterations, kShowGMeans, kShowPenalty );
// the sorting tests are much slower than the accumulation tests - O(N^2)
iterations = iterations / 1000;
std::vector vec_dataMaster;
std::vector vec_DataMaster;
vec_dataMaster.resize(SIZE);
vec_DataMaster.resize(SIZE);
// fill one set of random numbers
fill_random( dMpb, dMpe );
// copy to the other sets, so we have the same numbers
::copy( dMpb, dMpe, DMpb );
::copy( dMpb, dMpe, vec_dataMaster.begin() );
::copy( dMpb, dMpe, vec_DataMaster.begin() );
rrvdp rvdMpb(vec_dataMaster.rend());
rrvdp rvdMpe(vec_dataMaster.rbegin());
rrvDp rvDMpb(vec_DataMaster.rend());
rrvDp rvDMpe(vec_DataMaster.rbegin());
test_insertion_sort(dMpb, dMpe, dpb, dpe, dZero, "insertion_sort double pointer verify2");
test_insertion_sort(DMpb, DMpe, Dpb, Dpe, DZero, "insertion_sort DoubleClass pointer verify2");
test_insertion_sort(vec_dataMaster.begin(), vec_dataMaster.end(), vec_data.begin(), vec_data.end(), dZero, "insertion_sort double vector");
test_insertion_sort(vec_DataMaster.begin(), vec_DataMaster.end(), vec_Data.begin(), vec_Data.end(), DZero, "insertion_sort DoubleClass vector");
test_insertion_sort(vec_dataMaster.rbegin(), vec_dataMaster.rend(), vec_data.rbegin(), vec_data.rend(), dZero, "insertion_sort double vector reverse");
test_insertion_sort(vec_DataMaster.rbegin(), vec_DataMaster.rend(), vec_Data.rbegin(), vec_Data.rend(), DZero, "insertion_sort DoubleClass vector reverse");
test_insertion_sort(rvdMpb, rvdMpe, rvdpb, rvdpe, dZero, "insertion_sort double vector reverse reverse");
test_insertion_sort(rvDMpb, rvDMpe, rvDpb, rvDpe, DZero, "insertion_sort DoubleClass vector reverse reverse");
summarize("Vector Insertion Sort", SIZE, iterations, kShowGMeans, kShowPenalty );
// these are slightly faster - O(NLog2(N))
iterations = iterations * 8; // iterations / 25
test_quicksort(dMpb, dMpe, dpb, dpe, dZero, "quicksort double pointer verify2");
test_quicksort(DMpb, DMpe, Dpb, Dpe, DZero, "quicksort DoubleClass pointer verify2");
test_quicksort(vec_dataMaster.begin(), vec_dataMaster.end(), vec_data.begin(), vec_data.end(), dZero, "quicksort double vector");
test_quicksort(vec_DataMaster.begin(), vec_DataMaster.end(), vec_Data.begin(), vec_Data.end(), DZero, "quicksort DoubleClass vector");
test_quicksort(vec_dataMaster.rbegin(), vec_dataMaster.rend(), vec_data.rbegin(), vec_data.rend(), dZero, "quicksort double vector reverse");
test_quicksort(vec_DataMaster.rbegin(), vec_DataMaster.rend(), vec_Data.rbegin(), vec_Data.rend(), DZero, "quicksort DoubleClass vector reverse");
test_quicksort(rvdMpb, rvdMpe, rvdpb, rvdpe, dZero, "quicksort double vector reverse reverse");
test_quicksort(rvDMpb, rvDMpe, rvDpb, rvDpe, DZero, "quicksort DoubleClass vector reverse reverse");
summarize("Vector Quicksort", SIZE, iterations, kShowGMeans, kShowPenalty );
test_heap_sort(dMpb, dMpe, dpb, dpe, dZero, "heap_sort double pointer verify2");
test_heap_sort(DMpb, DMpe, Dpb, Dpe, DZero, "heap_sort DoubleClass pointer verify2");
test_heap_sort(vec_dataMaster.begin(), vec_dataMaster.end(), vec_data.begin(), vec_data.end(), dZero, "heap_sort double vector");
test_heap_sort(vec_DataMaster.begin(), vec_DataMaster.end(), vec_Data.begin(), vec_Data.end(), DZero, "heap_sort DoubleClass vector");
test_heap_sort(vec_dataMaster.rbegin(), vec_dataMaster.rend(), vec_data.rbegin(), vec_data.rend(), dZero, "heap_sort double vector reverse");
test_heap_sort(vec_DataMaster.rbegin(), vec_DataMaster.rend(), vec_Data.rbegin(), vec_Data.rend(), DZero, "heap_sort DoubleClass vector reverse");
test_heap_sort(rvdMpb, rvdMpe, rvdpb, rvdpe, dZero, "heap_sort double vector reverse reverse");
test_heap_sort(rvDMpb, rvDMpe, rvDpb, rvDpe, DZero, "heap_sort DoubleClass vector reverse reverse");
summarize("Vector Heap Sort", SIZE, iterations, kShowGMeans, kShowPenalty );
return 0;
}
// the end
/******************************************************************************/
/******************************************************************************/
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk