Only in exceptional cases we  make pull requests or changes in the master . Master only merges off develop ( that in turn merges off feature/bug branches). So the https://github.com/uBLAS/ublas/tree/feature/ublas00004_simd_gemm is the correct branch to request a pull.

Pull requests go to https://github.com/uBLAS/ublas and NOT https://github.com/boostorg/ublas). I see the pull request in the boostorg repo, so please perform it in the ublas repo. I need to clarify this in the wiki because probably it is not very obvious.

-Nasos



On 03/13/2016 02:26 PM, palik imre wrote:
A bit of confusion here.

I created a fork of the feature branch you sent, as I didn't have the rights to push there.  Then I sent a pull request for that.

Should I fork the master instead?


Thanks,

Imre


On Sunday, 13 March 2016, 19:03, palik imre <imre_palik@yahoo.co.uk> wrote:


Results for low dimmension.  More data would exceed mailing list limits:

#  m original: t1   MFLOPS original: t1    MFLOPS   Diff nrm3  gemm:   t2    MFLOPS   Diff nrm4 mixed:   t2    MFLOPS   Diff nrm5
  1   2.1263e-07   9.40601  1.32802e-07     15.06           0 1.36006e-07   14.7052           0 6.31318e-07   3.16798           0
  2  2.28189e-07   70.1173  1.37767e-07   116.138           0 1.59801e-07   100.125           0   6.653e-07   24.0493           0
  3    2.649e-07   203.851   1.5541e-07   347.468           0 1.54267e-07   350.042           0 6.98766e-07   77.2791           0
  4  3.35269e-07   381.783   2.4183e-07   529.297           0 2.12891e-07   601.247           0 6.65688e-07   192.282           0
  5  3.53868e-07   706.478  2.30977e-07   1082.36           0 2.53215e-07   987.303           0 7.07933e-07   353.141           0
  6   4.2987e-07   1004.95  2.59713e-07   1663.37           0 2.54867e-07      1695           0 8.17448e-07   528.474           0
  7  5.39621e-07   1271.26  4.51043e-07   1520.92 7.98975e-09 5.12948e-07   1337.37 7.98975e-09 8.76363e-07   782.781           0
  8  6.18993e-07    1654.3  6.38988e-07   1602.53 4.17673e-09 6.37931e-07   1605.19 4.17673e-09 8.92556e-07   1147.27           0
  9  7.73683e-07   1884.49  7.26336e-07   2007.34 3.30697e-09 8.00656e-07   1821.01 3.30697e-09 1.09762e-06   1328.33           0
 10  9.27569e-07   2156.17  8.31827e-07   2404.35 1.94317e-09 8.72131e-07   2293.23 1.94317e-09 1.16572e-06   1715.68           0
 11  1.13882e-06    2337.5  1.03275e-06   2577.58 1.27501e-09 1.08775e-06   2447.25 1.27501e-09 1.16439e-06   2286.17           0
 12  1.26427e-06   2733.59  1.40013e-06   2468.34 8.50076e-10 1.39562e-06   2476.32 8.50076e-10 1.01202e-06   3414.97           0
 13   1.5751e-06   2789.66  1.64811e-06   2666.09 5.39864e-10 1.66862e-06   2633.32 5.39864e-10 1.61517e-06   2720.45           0
 14  1.79595e-06   3055.77  1.89937e-06   2889.37 4.08632e-10  1.6485e-06   3329.09           0 1.65016e-06   3325.73           0
 15  2.14056e-06   3153.37  2.24248e-06   3010.06 2.73316e-10  1.6875e-06   3999.99           0 1.80164e-06   3746.59           0
 16  2.38996e-06   3427.67  2.63386e-06   3110.27 2.30152e-10 1.74627e-06   4691.14           0 1.91648e-06   4274.49           0
 17  2.93315e-06   3349.98  3.08031e-06   3189.94 1.85538e-10 2.17697e-06   4513.62           0 2.13505e-06   4602.23           0
 18   3.3771e-06   3453.85  3.23863e-06   3601.52 1.20251e-10 2.23225e-06   5225.23           0 2.36877e-06   4924.07           0
 19  4.19699e-06   3268.53  4.02621e-06   3407.17 1.07796e-10 2.29651e-06    5973.4           0 2.44714e-06   5605.72           0
 20  4.27777e-06   3740.27  4.86115e-06    3291.4 8.37665e-11 2.26798e-06   7054.74           0 2.44016e-06   6556.96           0
 21  5.58038e-06   3319.13  5.51606e-06   3357.83 5.93714e-11 2.61705e-06   7077.43           0 2.90197e-06   6382.56           0
 22  5.46208e-06   3898.88  5.50258e-06   3870.19 5.76987e-11 2.85448e-06   7460.56           0 3.09923e-06   6871.39           0
 23  7.26813e-06   3348.04  6.48407e-06   3752.89 4.47169e-11 3.03986e-06   8004.98           0 3.16566e-06   7686.86           0
 24  6.56421e-06   4211.93  7.20581e-06    3836.9 3.61275e-11 2.84288e-06   9725.35           0 2.81577e-06   9818.99           0
 25  7.97135e-06   3920.29  7.80654e-06   4003.06 3.02957e-11 4.04575e-06   7724.16           0 4.15001e-06   7530.11           0
 26  8.59272e-06    4090.9  8.46934e-06    4150.5 2.53217e-11  4.1795e-06   8410.58           0 4.36958e-06   8044.71           0
 27  1.05527e-05   3730.41  9.66865e-06   4071.51 1.97479e-11 4.24268e-06   9278.57           0 4.64476e-06   8475.37           0
 28  9.77679e-06   4490.63   1.0918e-05   4021.26 1.71505e-11 4.41728e-06   9939.14           0 4.55165e-06   9645.73           0
 29  1.23574e-05   3947.28  1.15308e-05   4230.22 1.54399e-11 4.96383e-06   9826.69           0 5.27042e-06   9255.05           0
 30  1.25312e-05   4309.24  1.23192e-05    4383.4 1.38837e-11 5.36616e-06   10063.1           0 5.57707e-06   9682.51           0
 31  1.41019e-05   4225.11  1.41554e-05   4209.15 1.12822e-11 5.56749e-06   10701.8           0 5.87983e-06   10133.3           0
 32  1.44935e-05   4521.76  1.74419e-05   3757.38  9.5502e-12 5.91291e-06   11083.5           0 6.07622e-06   10785.7           0
 33  1.68922e-05   4254.86  1.62224e-05   4430.55 8.00562e-12 6.51645e-06   11029.6           0 6.62821e-06   10843.7           0
 34  1.73001e-05    4543.8  1.68924e-05   4653.46 7.54927e-12 6.83433e-06   11501.9           0 6.95343e-06   11304.9           0
 35  2.07166e-05    4139.2  2.15962e-05   3970.61 6.52939e-12 7.06462e-06   12137.9           0 7.53811e-06   11375.5           0
 36  1.98326e-05   4704.97  2.13473e-05   4371.14 5.68874e-12   6.703e-06   13920.9           0 6.99365e-06   13342.4           0
 37   2.3838e-05   4249.78  2.23655e-05   4529.56 5.11318e-12 8.87253e-06   11417.9           0 9.13862e-06   11085.5           0
 38  2.35903e-05   4652.09  2.48122e-05   4422.99 4.71306e-12 9.24238e-06     11874           0 9.27922e-06   11826.9           0
 39  2.79913e-05   4238.39  2.64576e-05   4484.09 4.20714e-12 9.68511e-06   12249.5           0 9.95689e-06   11915.2           0
 40  2.60131e-05    4920.6   2.9098e-05   4398.93 3.42002e-12 9.80308e-06   13057.1           0 1.04198e-05   12284.3           0
 41  3.13419e-05   4398.01  3.03942e-05   4535.14 3.13757e-12 1.07587e-05   12812.2           0 1.10016e-05   12529.3           0
 42  3.10015e-05   4779.64  3.20343e-05   4625.54 2.91245e-12 1.09989e-05   13471.9           0 1.16031e-05   12770.4           0
 43   3.6527e-05   4353.33  3.49908e-05   4544.46 2.71446e-12 1.13164e-05   14051.7           0 1.20516e-05   13194.4           0
 44  3.36654e-05   5060.62  3.86435e-05   4408.71 2.49076e-12 1.16151e-05   14667.8           0 1.21377e-05   14036.2           0
 45  3.95282e-05   4610.63  3.98562e-05   4572.69 2.12037e-12 1.26784e-05   14374.8           0 1.32723e-05   13731.6           0
 46  3.96351e-05    4911.6  4.17105e-05   4667.22 1.96734e-12 1.27302e-05   15292.2           0 1.34346e-05   14490.3           0
 47  4.63424e-05   4480.69  4.50811e-05   4606.05 1.77515e-12 1.33133e-05   15596.9           0 1.39354e-05   14900.7           0
 48  4.31748e-05   5122.99   5.0325e-05   4395.11 1.75073e-12 1.32491e-05   16694.3           0  1.3501e-05   16382.8           0
 49  4.93001e-05   4772.77  5.11402e-05   4601.03 1.48788e-12  1.6222e-05   14504.9           0 1.72531e-05     13638           0


First group is legacy axpy_prod(), second group is legacy prod(), third group is legacy prod for low dimensions, and gemm() for high dimmension. fourth group is gemm().

As the legacy version is expression template based, it can possibly provide some further advantages when the operations are chained.

I put some defines in place, that would make possible to force the legacy version as the default, as opposed to the runtime switched version.

Imre


On Friday, 11 March 2016, 14:21, Nasos Iliopoulos <nasos_i@hotmail.com> wrote:


Regardless, these are great figures.

Can you please run them comparing the simple uBlas implementation for matrices from 2 to 100 with the gemm based one with a signle thread? I wonder when the control statement starts to play a role.

What do you think should be the plan to switch from multi-core to to single-threaded so as to not get all the communication hit for smaller matrices?


- Nasos



_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: imre_palik@yahoo.co.uk




_______________________________________________
ublas mailing list
ublas@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/ublas
Sent to: athanasios.iliopoulos.ctr.gr@nrl.navy.mil