Subject: [Boost-bugs] [Boost C++ Libraries] #8509: SSE/AVX optimization and C++11 support
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-04-28 00:32:29
#8509: SSE/AVX optimization and C++11 support
------------------------------+---------------------------------------------
Reporter: andysem | Owner: atompkins
Type: Patches | Status: new
Milestone: To Be Determined | Component: uuid
Version: Boost 1.53.0 | Severity: Optimization
Keywords: |
------------------------------+---------------------------------------------
The suboptimal performance of boost::uuids::uuid operators had been
brought up on the developers mailing list before, and I have performed
some testing on various compilers to confirm that. I also have
applications that depend on uuid operations performance, so I'm interested
in optimizing it.
I've attached the test I used for benchmarking, and also my testing
results performed on Intel Core i7 2600K (also tried on an older Core 2
Duo machine with similar results). The benchmarking code includes the
"stock" functions which correspond to the current implementations of the
equality and ordering operators, the "mem" functions based on memcmp, and
"simd" functions that are implemented with SSE intrinsics. The tests
measure the time needed to perform a certain number of operations in a
loop. The arguments to the operations are either placed on the stack or on
the heap (to emulate distinct objects in an application). To summarize the
results:
1. The simd_equal version is the fastest across almost all configurations.
The performance gain varies and can be 3.5x - 8x faster than the stock
version. On MSVC x64 target though, all variants perform close (mem and
sind slightly faster) if the compared values are placed adjacently on the
stack. The simd version is still the fastest one if the operands are
allocated on the heap.
2. On MSVC x86 target mem_less turned out to be the fastest, with
simd_less coming second. On other platforms, including MSVC x64, simd_less
performed best (with more moderate gain though - 1.6x to 2.3x faster than
the stock version).
Based on these results I've prepared a patch for uuid that makes use of
SSE/AVX operations when possible (basically, it uses the "simd" versions
when SSE/AVX is enabled at compile time). Also, the patch changes the
generic implementations of the operators to use memcmp, since compilers
generally better optimize code with it as opposed to std::equal and
std::lexicographical_compare (to be fair, GCC and Clang generated the same
code for "stock" and "mem" versions). For MSVC x86, the generic (now
memcmp-based) operator< is used since it showed faster in the tests.
Lastly, the patch adds constexpr and noexcept where appropriate to improve
compatibility with C++11 and allow for further optimizations by the
supporting compilers.
I would be glad to see this patch applied. If you have any questions or
comments, I'll be glad to answer here or on the mailing list.
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/8509> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:12 UTC