In my program I am seeing my code use the slower (I assume) subtract_unsigned_constexpr rather than subtract_unsigned with he borrow chain handled intrinsics.
This seems to be happening because of these lines in intel_intrinsics.hpp:
#if defined(__clang__) && (__clang__ < 9)
// We appear to crash the compiler if we try to use these intrinsics?
__clang__ is not the version but just the value 1. I expect this should be the major version instead:
#define __clang__ 1
#define __clang_major__ 12
#define __clang_minor__ 0
As an aside 128bit support does work in windows with clang. I have to force it on like this: