Results
for low dimmension. More data would exceed
mailing list limits:
#
m original: t1 MFLOPS original: t1 MFLOPS
Diff nrm3 gemm: t2 MFLOPS Diff nrm4
mixed: t2 MFLOPS Diff nrm5
1 2.1263e-07 9.40601 1.32802e-07
15.06 0 1.36006e-07
14.7052 0 6.31318e-07
3.16798 0
2 2.28189e-07 70.1173 1.37767e-07
116.138 0 1.59801e-07
100.125 0 6.653e-07
24.0493 0
3 2.649e-07 203.851 1.5541e-07
347.468 0 1.54267e-07
350.042 0 6.98766e-07
77.2791 0
4 3.35269e-07 381.783 2.4183e-07
529.297 0 2.12891e-07
601.247 0 6.65688e-07
192.282 0
5 3.53868e-07 706.478 2.30977e-07
1082.36 0 2.53215e-07
987.303 0 7.07933e-07
353.141 0
6 4.2987e-07 1004.95 2.59713e-07
1663.37 0 2.54867e-07
1695 0 8.17448e-07 528.474
0
7 5.39621e-07 1271.26 4.51043e-07
1520.92 7.98975e-09 5.12948e-07 1337.37
7.98975e-09 8.76363e-07 782.781 0
8 6.18993e-07 1654.3 6.38988e-07
1602.53 4.17673e-09 6.37931e-07 1605.19
4.17673e-09 8.92556e-07 1147.27 0
9 7.73683e-07 1884.49 7.26336e-07
2007.34 3.30697e-09 8.00656e-07 1821.01
3.30697e-09 1.09762e-06 1328.33 0
10 9.27569e-07 2156.17 8.31827e-07
2404.35 1.94317e-09 8.72131e-07 2293.23
1.94317e-09 1.16572e-06 1715.68 0
11 1.13882e-06 2337.5 1.03275e-06
2577.58 1.27501e-09 1.08775e-06 2447.25
1.27501e-09 1.16439e-06 2286.17 0
12 1.26427e-06 2733.59 1.40013e-06
2468.34 8.50076e-10 1.39562e-06 2476.32
8.50076e-10 1.01202e-06 3414.97 0
13 1.5751e-06 2789.66 1.64811e-06
2666.09 5.39864e-10 1.66862e-06 2633.32
5.39864e-10 1.61517e-06 2720.45 0
14 1.79595e-06 3055.77 1.89937e-06
2889.37 4.08632e-10 1.6485e-06
3329.09 0 1.65016e-06
3325.73 0
15 2.14056e-06 3153.37 2.24248e-06
3010.06 2.73316e-10 1.6875e-06
3999.99 0 1.80164e-06
3746.59 0
16 2.38996e-06 3427.67 2.63386e-06
3110.27 2.30152e-10 1.74627e-06
4691.14 0 1.91648e-06
4274.49 0
17 2.93315e-06 3349.98 3.08031e-06
3189.94 1.85538e-10 2.17697e-06
4513.62 0 2.13505e-06
4602.23 0
18 3.3771e-06 3453.85 3.23863e-06
3601.52 1.20251e-10 2.23225e-06
5225.23 0 2.36877e-06
4924.07 0
19 4.19699e-06 3268.53 4.02621e-06
3407.17 1.07796e-10 2.29651e-06
5973.4 0 2.44714e-06
5605.72 0
20 4.27777e-06 3740.27 4.86115e-06
3291.4 8.37665e-11 2.26798e-06
7054.74 0 2.44016e-06
6556.96 0
21 5.58038e-06 3319.13 5.51606e-06
3357.83 5.93714e-11 2.61705e-06
7077.43 0 2.90197e-06
6382.56 0
22 5.46208e-06 3898.88 5.50258e-06
3870.19 5.76987e-11 2.85448e-06
7460.56 0 3.09923e-06
6871.39 0
23 7.26813e-06 3348.04 6.48407e-06
3752.89 4.47169e-11 3.03986e-06
8004.98 0 3.16566e-06
7686.86 0
24 6.56421e-06 4211.93 7.20581e-06
3836.9 3.61275e-11 2.84288e-06
9725.35 0 2.81577e-06
9818.99 0
25 7.97135e-06 3920.29 7.80654e-06
4003.06 3.02957e-11 4.04575e-06
7724.16 0 4.15001e-06
7530.11 0
26 8.59272e-06 4090.9 8.46934e-06
4150.5 2.53217e-11 4.1795e-06
8410.58 0 4.36958e-06
8044.71 0
27 1.05527e-05 3730.41 9.66865e-06
4071.51 1.97479e-11 4.24268e-06
9278.57 0 4.64476e-06
8475.37 0
28 9.77679e-06 4490.63 1.0918e-05
4021.26 1.71505e-11 4.41728e-06
9939.14 0 4.55165e-06
9645.73 0
29 1.23574e-05 3947.28 1.15308e-05
4230.22 1.54399e-11 4.96383e-06
9826.69 0 5.27042e-06
9255.05 0
30 1.25312e-05 4309.24 1.23192e-05
4383.4 1.38837e-11 5.36616e-06
10063.1 0 5.57707e-06
9682.51 0
31 1.41019e-05 4225.11 1.41554e-05
4209.15 1.12822e-11 5.56749e-06
10701.8 0 5.87983e-06
10133.3 0
32 1.44935e-05 4521.76 1.74419e-05
3757.38 9.5502e-12 5.91291e-06
11083.5 0 6.07622e-06
10785.7 0
33 1.68922e-05 4254.86 1.62224e-05
4430.55 8.00562e-12 6.51645e-06
11029.6 0 6.62821e-06
10843.7 0
34 1.73001e-05 4543.8 1.68924e-05
4653.46 7.54927e-12 6.83433e-06
11501.9 0 6.95343e-06
11304.9 0
35 2.07166e-05 4139.2 2.15962e-05
3970.61 6.52939e-12 7.06462e-06
12137.9 0 7.53811e-06
11375.5 0
36 1.98326e-05 4704.97 2.13473e-05
4371.14 5.68874e-12 6.703e-06
13920.9 0 6.99365e-06
13342.4 0
37 2.3838e-05 4249.78 2.23655e-05
4529.56 5.11318e-12 8.87253e-06
11417.9 0 9.13862e-06
11085.5 0
38 2.35903e-05 4652.09 2.48122e-05
4422.99 4.71306e-12 9.24238e-06
11874 0 9.27922e-06
11826.9 0
39 2.79913e-05 4238.39 2.64576e-05
4484.09 4.20714e-12 9.68511e-06
12249.5 0 9.95689e-06
11915.2 0
40 2.60131e-05 4920.6 2.9098e-05
4398.93 3.42002e-12 9.80308e-06
13057.1 0 1.04198e-05
12284.3 0
41 3.13419e-05 4398.01 3.03942e-05
4535.14 3.13757e-12 1.07587e-05
12812.2 0 1.10016e-05
12529.3 0
42 3.10015e-05 4779.64 3.20343e-05
4625.54 2.91245e-12 1.09989e-05
13471.9 0 1.16031e-05
12770.4 0
43 3.6527e-05 4353.33 3.49908e-05
4544.46 2.71446e-12 1.13164e-05
14051.7 0 1.20516e-05
13194.4 0
44 3.36654e-05 5060.62 3.86435e-05
4408.71 2.49076e-12 1.16151e-05
14667.8 0 1.21377e-05
14036.2 0
45 3.95282e-05 4610.63 3.98562e-05
4572.69 2.12037e-12 1.26784e-05
14374.8 0 1.32723e-05
13731.6 0
46 3.96351e-05 4911.6 4.17105e-05
4667.22 1.96734e-12 1.27302e-05
15292.2 0 1.34346e-05
14490.3 0
47 4.63424e-05 4480.69 4.50811e-05
4606.05 1.77515e-12 1.33133e-05
15596.9 0 1.39354e-05
14900.7 0
48 4.31748e-05 5122.99 5.0325e-05
4395.11 1.75073e-12 1.32491e-05
16694.3 0 1.3501e-05
16382.8 0
49 4.93001e-05 4772.77 5.11402e-05
4601.03 1.48788e-12 1.6222e-05
14504.9 0 1.72531e-05
13638 0
First
group is legacy axpy_prod(), second group is
legacy prod(), third group is legacy prod for
low dimensions, and gemm() for high dimmension.
fourth group is gemm().
As
the legacy version is expression template based,
it can possibly provide some further advantages
when the operations are chained.
I
put some defines in place, that would make
possible to force the legacy version as the
default, as opposed to the runtime switched
version.
Regardless, these are great figures.
Can you please run them comparing
the simple uBlas implementation for
matrices from 2 to 100 with the gemm
based one with a signle thread? I
wonder when the control statement
starts to play a role.
What do you think should be the plan
to switch from multi-core to to
single-threaded so as to not get all
the communication hit for smaller
matrices?
- Nasos