|
Boost : |
From: Johan Råde (rade_at_[hidden])
Date: 2006-10-10 04:34:54
John Maddock wrote:
> Johan Råde wrote:
>> Many of these functions have also been
>> implemented by John Maddock.
>> The difference is that my implementation
>> is optimized for speed.
>>
>> Here are some timing numbers on VC 7.1:
>
>> The times are in clock cycles for a single call,
>> based on an average of 10,000,000 calls
>> with random input.
>
> I think the timings for my latest code should be a bit better than that as
> I've started forwarding to native versions where applicable.
>
> Even so, I think this is a good idea and worth persuing. Unfortunately I
> don't think your code is as portable as you think it is: I'm attaching below
> some failed test runs of your code with Intel (Win32 and Linux), HP aCC and
> g++ (HP-UX). The difficulty is that once you crank up the optimisation
> levels, code like this tends to break, and since it's reason-for-existance
> is to improve performance, that's a problem :-(
>
> I have some specific comments on the code:
>
> 1) I'm a bit uneasy about the #undef's of the std macros: let's say a user
> includes this header indirectly (because it's used by another Boost header,
> say my math stuff), they might be a bit miffed if the the macros they were
> expecting to be defined have been undef'ed like this.
>
You are obviously right.
> 2) I'm a little surprised at seeing memcpy used to shift 4 bytes, given the
> slightly-hairy nature of what you're doing here, I don't think a
> reinterpret_cast to uint32 is unreasonable in comparison. But that's just
> my opinion :-)
Sometimes I shift by 6 bytes.
For simplicity I handle all the shifts the same way.
When we get the stuff to work, we could optimize the 4 byte case.
> 3) With respect to the #error Unknown endianness, perhaps we can forward to
> a slower-but safer alternative in this case?
The code should be like this:
#include <boost/detail/endian.hpp>
...
#if defined BOOST_BIG_ENDIAN
...
#elif defined BOOST_LITTLE_ENDIAN
...
#elif defined BOOST_PDP_ENDIAN
# error PDP-endian platforms are not supported
#else
# error This can not happen
#endif
> 4) I'm fairly sure the declarations:
>
> template<> const boost::uint32_t mask<float>::exponent = 0x7f800000;
> template<> const boost::uint32_t mask<double>::exponent = 0x7ff00000;
> template<> const boost::uint32_t mask<long double>::exponent
>
> will cause ODR violations / linker errors when the header is used from
> multiple TU's.
You may be right.
> 5) isinf needs to handle the situation where T doesn't have infinities, or
> perhaps even doesn't have / can't have numeric_limits support (NTL::RR for
> example).
This is news to me. I don't know enough about this problem to have an
informed opinion.
> 6) Does the implementaton of isnan:
>
> template<class T> bool isnan(T x)
> {
> return !(x <= std::numeric_limits<T>::infinity());
> }
>
> work for negative NaN's ? I'm thinking of compilers that perform
> non-standard-IEEE comparisons using faster comparisons of bit-patterns
> rather than fully-unordered NaN's.
There are all sorts of problems with the current isnan implementation.
I will replace it by one that relies on bit operations instead of
comparisons, at the cost of a few extra clock cycles.
The new implementation will be immune to optimizing compilers.
>
> OK where does this leave us.
>
> Looks like the way forward might be some kind of merger of our
> implementations: using the slow-but-safe version as a fallback for where the
> fast-and-hairy-version doesn't work?
>
> The difficulty is doing it without the code turning into a spagetty of #if
> logic.
>
> I wonder if something like:
>
> template <class T>
> T isnan(T x)
> {
> return isnan_imp(x, typename classify_traits<T>::type());
> }
>
> where classify_traits<T>::type is some kind of type that encapsulates:
>
> Endianness.
> Is it a real float.
> Does it have numeric limits support.
> Any compiler specific information - like "use fast if possible", or "always
> use safe version".
>
> The we could use compile time dispatch to the right implementation, and all
> the hairy logic and #if code would be isolated to classify_traits (or
> whatever it's called).
>
> Does this sound plausible, and/or something you might be interested in
> persuing?
You are right in principle, but it may be overkill.
Let's first see how far we can get with my approach.
It is not as hairy as you think.
It should work for all processor that are (more or less) IEEE754 compliant.
>
> Thanks,
>
> John.
>
> ps heres the failure logs:
---------------------------------------------------------
>
> Intel 9.1, Win32, VC-8 mode:
>
> $ icl /O2 /EHs -I../../../develop/boost test.cpp
> Intel(R) C++ Compiler for 32-bit applications, Version 9.1 Build
> 20060519Z
> Copyright (C) 1985-2006 Intel Corporation. All rights reserved.
> 30 DAY EVALUATION LICENSE
>
> icl: NOTE: The evaluation period for this product ends on 11-feb-2008 UTC.
> test.cpp
> Microsoft (R) Incremental Linker Version 8.00.50727.42
> Copyright (C) Microsoft Corporation. All rights reserved.
>
> -out:test.exe
> test.obj
>
> John Maddock_at_fuji /cygdrive/c/data/boost/fpclassify/fpclassify/test
> $ ./test
> little endian
> sizeof(long double) = 8
> mask<long double>::exponent = 7ff00000
>
> Failed test isnan(x)
> Failed test fpclassify(x) == FP_NAN
> Failed test isnan(x)
> Failed test fpclassify(x) == FP_NAN
> Failed test isnan(x)
> Failed test fpclassify(x) == FP_NAN
> Failed test isnan(x)
> Failed test fpclassify(x) == FP_NAN
[snip]
New isnan implementation will fix this.
---------------------------------------------------------------
>
> HP aCC, HP-UX, Itanium:
>
> aCC -AA -O3 -I ~ *.cpp
> maddock_at_td176> ./a.out
> little endian
> sizeof(long double) = 16
> mask<long double>::exponent = 0
>
[snip: lots of failures]
Will need more refined logic to detect exponent mask.
--------------------------------------------------------------
>
> G++, HP-UX, Itanium:
>
> maddock_at_td176> g++
> g++: no input files
> maddock_at_td176> g++ -I ~ -O3 *.cpp
> maddock_at_td176> ./a.out
> little endian
> sizeof(long double) = 16
> mask<long double>::exponent = 0
>
[snip: lots of failures]
Will need more refined logic to determine exponent mask.
-------------------------------------------------------
> Intel 9.1, Linux, Itanium:
>
> maddock_at_[hidden]> icc -I ~ -O3 -fast -fp-model fast=2
> *.cpp -stat
> ic
> IPO: performing single-file optimizations
> IPO: generating object file /tmp/ipo_iccbMO6qR.o
> maddock_at_[hidden]> ./a.out
> little endian
> sizeof(long double) = 16
> mask<long double>::exponent = 7fff0000
>
> Failed test fpclassify(x) == FP_NAN
> Failed test fpclassify(x) == FP_NAN
> Failed test fpclassify(x) == FP_NAN
[snip]
New isnan will fix the this.
> Failed test isnormal(x)
> Failed test fpclassify(x) == FP_NORMAL
> Failed test isnormal(x)
> Failed test fpclassify(x) == FP_NORMAL
[snip]
What is going on here?
Error in <limits>?
That would not be the first time.
____________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
>
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk