Boost logo

Boost :

From: degski (degski_at_[hidden])
Date: 2020-03-18 00:49:18


On Tue, 17 Mar 2020 at 15:27, Kemin Zhou via Boost <boost_at_[hidden]>
wrote:

> Sorry for not doing the speed check. My argument is purely based on
> looking at the code and counting the number of executions.
> The cost of constructing an object, and the code of updating only.
> In this situation, I have a tight loop, with the input values already
> validated thus can save the extra
> check inside the constructor.
>

 You can and should not make assumptions based on the code (like Paul
says). In case of random numbers/distributions, micro-bench-marking is very
very difficult and bound to give the wrong result/conclusion. The only
thing to do is, write all of the code [in a flexible way] and
macro-benchmark (test the code in the context of your current application,
subtle changes make huge differences).

Iff all else fails, and you still need more speed, and you are running
Intel cpu's [that's lots of if's], you can try the distributions in the
Intel Performance Libraries, Math Kernel Library (MKL). This will
undoubtedly require you to restructure you're code, you'll be wrapping a
c-api. It WILL be faster, it is FREE to use, AVAILABLE on
Windows/Linux/IOS. If you're running an AMD, you can try their math-lib,
but you're of to a bad start [with AMD]. Lastly you're talking tight loops,
is the code naturally parallel? If so, you could look at GPU's, OpenCL on
Intel CPU/GPU's or one of the Graphic's card vendors.

Fiddling with random-numbers is highly entertaining, but also a
rabbit-hole, do the macro bench-marking and you are set to have to correct
answer. That correct answer could well be very counter-intuitive. Extra
instructions not necessarily slow things down, and I have proof that one
can construct cases that more instructions improve overall through-put
[possibly due to quirks in the scheduler].

One last thing, in this kind of situation always test things also, but not
only, with (Thin-)LTO or LTCG turned on. And a last last one /O2 is max
optimizations on MSVC, not as many think /Ox. On clang/gcc -O3 is not
necessarily faster than -O2. If you can live with it, -fast-math might help
as well, but that has important repercussions.

degski

-- 
@systemdeg
"We value your privacy, click here!" Sod off! - degski
"Anyone who believes that exponential growth can go on forever in a finite
world is either a madman or an economist" - Kenneth E. Boulding
"Growth for the sake of growth is the ideology of the cancer cell" - Edward
P. Abbey

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk