[Boost-bugs] [Boost C++ Libraries] #8509: SSE/AVX optimization and C++11 support

Subject: [Boost-bugs] [Boost C++ Libraries] #8509: SSE/AVX optimization and C++11 support
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2013-04-28 00:32:29


#8509: SSE/AVX optimization and C++11 support
------------------------------+---------------------------------------------
 Reporter: andysem | Owner: atompkins
     Type: Patches | Status: new
Milestone: To Be Determined | Component: uuid
  Version: Boost 1.53.0 | Severity: Optimization
 Keywords: |
------------------------------+---------------------------------------------
 The suboptimal performance of boost::uuids::uuid operators had been
 brought up on the developers mailing list before, and I have performed
 some testing on various compilers to confirm that. I also have
 applications that depend on uuid operations performance, so I'm interested
 in optimizing it.

 I've attached the test I used for benchmarking, and also my testing
 results performed on Intel Core i7 2600K (also tried on an older Core 2
 Duo machine with similar results). The benchmarking code includes the
 "stock" functions which correspond to the current implementations of the
 equality and ordering operators, the "mem" functions based on memcmp, and
 "simd" functions that are implemented with SSE intrinsics. The tests
 measure the time needed to perform a certain number of operations in a
 loop. The arguments to the operations are either placed on the stack or on
 the heap (to emulate distinct objects in an application). To summarize the
 results:

 1. The simd_equal version is the fastest across almost all configurations.
 The performance gain varies and can be 3.5x - 8x faster than the stock
 version. On MSVC x64 target though, all variants perform close (mem and
 sind slightly faster) if the compared values are placed adjacently on the
 stack. The simd version is still the fastest one if the operands are
 allocated on the heap.

 2. On MSVC x86 target mem_less turned out to be the fastest, with
 simd_less coming second. On other platforms, including MSVC x64, simd_less
 performed best (with more moderate gain though - 1.6x to 2.3x faster than
 the stock version).

 Based on these results I've prepared a patch for uuid that makes use of
 SSE/AVX operations when possible (basically, it uses the "simd" versions
 when SSE/AVX is enabled at compile time). Also, the patch changes the
 generic implementations of the operators to use memcmp, since compilers
 generally better optimize code with it as opposed to std::equal and
 std::lexicographical_compare (to be fair, GCC and Clang generated the same
 code for "stock" and "mem" versions). For MSVC x86, the generic (now
 memcmp-based) operator< is used since it showed faster in the tests.
 Lastly, the patch adds constexpr and noexcept where appropriate to improve
 compatibility with C++11 and allow for further optimizations by the
 supporting compilers.

 I would be glad to see this patch applied. If you have any questions or
 comments, I'll be glad to answer here or on the mailing list.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/8509>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:12 UTC