Boost logo

Ublas :

Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
From: Nasos Iliopoulos (nasos_i_at_[hidden])
Date: 2016-03-14 11:12:34


Only in exceptional cases we make pull requests or changes in the
master . Master only merges off develop ( that in turn merges off
feature/bug branches). So the
https://github.com/uBLAS/ublas/tree/feature/ublas00004_simd_gemm is the
correct branch to request a pull.

Pull requests go to https://github.com/uBLAS/ublas and NOT
https://github.com/boostorg/ublas). I see the pull request in the
boostorg repo, so please perform it in the ublas repo. I need to clarify
this in the wiki because probably it is not very obvious.

-Nasos

On 03/13/2016 02:26 PM, palik imre wrote:
> A bit of confusion here.
>
> I created a fork of the feature branch you sent, as I didn't have the
> rights to push there. Then I sent a pull request for that.
>
> Should I fork the master instead?
>
>
> Thanks,
>
> Imre
>
>
> On Sunday, 13 March 2016, 19:03, palik imre <imre_palik_at_[hidden]>
> wrote:
>
>
> Results for low dimmension. More data would exceed mailing list limits:
>
> # m original: t1 MFLOPS original: t1 MFLOPS Diff nrm3 gemm:
> t2 MFLOPS Diff nrm4 mixed: t2 MFLOPS Diff nrm5
> 1 2.1263e-07 9.40601 1.32802e-07 15.06 0 1.36006e-07
> 14.7052 0 6.31318e-07 3.16798 0
> 2 2.28189e-07 70.1173 1.37767e-07 116.138 0
> 1.59801e-07 100.125 0 6.653e-07 24.0493 0
> 3 2.649e-07 203.851 1.5541e-07 347.468 0
> 1.54267e-07 350.042 0 6.98766e-07 77.2791 0
> 4 3.35269e-07 381.783 2.4183e-07 529.297 0
> 2.12891e-07 601.247 0 6.65688e-07 192.282 0
> 5 3.53868e-07 706.478 2.30977e-07 1082.36 0
> 2.53215e-07 987.303 0 7.07933e-07 353.141 0
> 6 4.2987e-07 1004.95 2.59713e-07 1663.37 0
> 2.54867e-07 1695 0 8.17448e-07 528.474 0
> 7 5.39621e-07 1271.26 4.51043e-07 1520.92 7.98975e-09
> 5.12948e-07 1337.37 7.98975e-09 8.76363e-07 782.781 0
> 8 6.18993e-07 1654.3 6.38988e-07 1602.53 4.17673e-09
> 6.37931e-07 1605.19 4.17673e-09 8.92556e-07 1147.27 0
> 9 7.73683e-07 1884.49 7.26336e-07 2007.34 3.30697e-09
> 8.00656e-07 1821.01 3.30697e-09 1.09762e-06 1328.33 0
> 10 9.27569e-07 2156.17 8.31827e-07 2404.35 1.94317e-09
> 8.72131e-07 2293.23 1.94317e-09 1.16572e-06 1715.68 0
> 11 1.13882e-06 2337.5 1.03275e-06 2577.58 1.27501e-09
> 1.08775e-06 2447.25 1.27501e-09 1.16439e-06 2286.17 0
> 12 1.26427e-06 2733.59 1.40013e-06 2468.34 8.50076e-10
> 1.39562e-06 2476.32 8.50076e-10 1.01202e-06 3414.97 0
> 13 1.5751e-06 2789.66 1.64811e-06 2666.09 5.39864e-10
> 1.66862e-06 2633.32 5.39864e-10 1.61517e-06 2720.45 0
> 14 1.79595e-06 3055.77 1.89937e-06 2889.37 4.08632e-10
> 1.6485e-06 3329.09 0 1.65016e-06 3325.73 0
> 15 2.14056e-06 3153.37 2.24248e-06 3010.06 2.73316e-10
> 1.6875e-06 3999.99 0 1.80164e-06 3746.59 0
> 16 2.38996e-06 3427.67 2.63386e-06 3110.27 2.30152e-10
> 1.74627e-06 4691.14 0 1.91648e-06 4274.49 0
> 17 2.93315e-06 3349.98 3.08031e-06 3189.94 1.85538e-10
> 2.17697e-06 4513.62 0 2.13505e-06 4602.23 0
> 18 3.3771e-06 3453.85 3.23863e-06 3601.52 1.20251e-10
> 2.23225e-06 5225.23 0 2.36877e-06 4924.07 0
> 19 4.19699e-06 3268.53 4.02621e-06 3407.17 1.07796e-10
> 2.29651e-06 5973.4 0 2.44714e-06 5605.72 0
> 20 4.27777e-06 3740.27 4.86115e-06 3291.4 8.37665e-11 2.26798e-06
> 7054.74 0 2.44016e-06 6556.96 0
> 21 5.58038e-06 3319.13 5.51606e-06 3357.83 5.93714e-11
> 2.61705e-06 7077.43 0 2.90197e-06 6382.56 0
> 22 5.46208e-06 3898.88 5.50258e-06 3870.19 5.76987e-11
> 2.85448e-06 7460.56 0 3.09923e-06 6871.39 0
> 23 7.26813e-06 3348.04 6.48407e-06 3752.89 4.47169e-11
> 3.03986e-06 8004.98 0 3.16566e-06 7686.86 0
> 24 6.56421e-06 4211.93 7.20581e-06 3836.9 3.61275e-11 2.84288e-06
> 9725.35 0 2.81577e-06 9818.99 0
> 25 7.97135e-06 3920.29 7.80654e-06 4003.06 3.02957e-11
> 4.04575e-06 7724.16 0 4.15001e-06 7530.11 0
> 26 8.59272e-06 4090.9 8.46934e-06 4150.5 2.53217e-11 4.1795e-06
> 8410.58 0 4.36958e-06 8044.71 0
> 27 1.05527e-05 3730.41 9.66865e-06 4071.51 1.97479e-11
> 4.24268e-06 9278.57 0 4.64476e-06 8475.37 0
> 28 9.77679e-06 4490.63 1.0918e-05 4021.26 1.71505e-11
> 4.41728e-06 9939.14 0 4.55165e-06 9645.73 0
> 29 1.23574e-05 3947.28 1.15308e-05 4230.22 1.54399e-11
> 4.96383e-06 9826.69 0 5.27042e-06 9255.05 0
> 30 1.25312e-05 4309.24 1.23192e-05 4383.4 1.38837e-11 5.36616e-06
> 10063.1 0 5.57707e-06 9682.51 0
> 31 1.41019e-05 4225.11 1.41554e-05 4209.15 1.12822e-11
> 5.56749e-06 10701.8 0 5.87983e-06 10133.3 0
> 32 1.44935e-05 4521.76 1.74419e-05 3757.38 9.5502e-12
> 5.91291e-06 11083.5 0 6.07622e-06 10785.7 0
> 33 1.68922e-05 4254.86 1.62224e-05 4430.55 8.00562e-12
> 6.51645e-06 11029.6 0 6.62821e-06 10843.7 0
> 34 1.73001e-05 4543.8 1.68924e-05 4653.46 7.54927e-12
> 6.83433e-06 11501.9 0 6.95343e-06 11304.9 0
> 35 2.07166e-05 4139.2 2.15962e-05 3970.61 6.52939e-12
> 7.06462e-06 12137.9 0 7.53811e-06 11375.5 0
> 36 1.98326e-05 4704.97 2.13473e-05 4371.14 5.68874e-12
> 6.703e-06 13920.9 0 6.99365e-06 13342.4 0
> 37 2.3838e-05 4249.78 2.23655e-05 4529.56 5.11318e-12
> 8.87253e-06 11417.9 0 9.13862e-06 11085.5 0
> 38 2.35903e-05 4652.09 2.48122e-05 4422.99 4.71306e-12
> 9.24238e-06 11874 0 9.27922e-06 11826.9 0
> 39 2.79913e-05 4238.39 2.64576e-05 4484.09 4.20714e-12
> 9.68511e-06 12249.5 0 9.95689e-06 11915.2 0
> 40 2.60131e-05 4920.6 2.9098e-05 4398.93 3.42002e-12
> 9.80308e-06 13057.1 0 1.04198e-05 12284.3 0
> 41 3.13419e-05 4398.01 3.03942e-05 4535.14 3.13757e-12
> 1.07587e-05 12812.2 0 1.10016e-05 12529.3 0
> 42 3.10015e-05 4779.64 3.20343e-05 4625.54 2.91245e-12
> 1.09989e-05 13471.9 0 1.16031e-05 12770.4 0
> 43 3.6527e-05 4353.33 3.49908e-05 4544.46 2.71446e-12
> 1.13164e-05 14051.7 0 1.20516e-05 13194.4 0
> 44 3.36654e-05 5060.62 3.86435e-05 4408.71 2.49076e-12
> 1.16151e-05 14667.8 0 1.21377e-05 14036.2 0
> 45 3.95282e-05 4610.63 3.98562e-05 4572.69 2.12037e-12
> 1.26784e-05 14374.8 0 1.32723e-05 13731.6 0
> 46 3.96351e-05 4911.6 4.17105e-05 4667.22 1.96734e-12
> 1.27302e-05 15292.2 0 1.34346e-05 14490.3 0
> 47 4.63424e-05 4480.69 4.50811e-05 4606.05 1.77515e-12
> 1.33133e-05 15596.9 0 1.39354e-05 14900.7 0
> 48 4.31748e-05 5122.99 5.0325e-05 4395.11 1.75073e-12
> 1.32491e-05 16694.3 0 1.3501e-05 16382.8 0
> 49 4.93001e-05 4772.77 5.11402e-05 4601.03 1.48788e-12
> 1.6222e-05 14504.9 0 1.72531e-05 13638 0
>
>
> First group is legacy axpy_prod(), second group is legacy prod(),
> third group is legacy prod for low dimensions, and gemm() for high
> dimmension. fourth group is gemm().
>
> As the legacy version is expression template based, it can possibly
> provide some further advantages when the operations are chained.
>
> I put some defines in place, that would make possible to force the
> legacy version as the default, as opposed to the runtime switched version.
>
> Imre
>
>
> On Friday, 11 March 2016, 14:21, Nasos Iliopoulos
> <nasos_i_at_[hidden]> wrote:
>
>
> Regardless, these are great figures.
>
> Can you please run them comparing the simple uBlas implementation for
> matrices from 2 to 100 with the gemm based one with a signle thread? I
> wonder when the control statement starts to play a role.
>
> What do you think should be the plan to switch from multi-core to to
> single-threaded so as to not get all the communication hit for smaller
> matrices?
>
>
> - Nasos
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden] <mailto:ublas_at_[hidden]>
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: imre_palik_at_[hidden] <mailto:imre_palik_at_[hidden]>
>
>
>
>
> _______________________________________________
> ublas mailing list
> ublas_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/ublas
> Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]