Boost logo

Ublas :

Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
From: palik imre (imre_palik_at_[hidden])
Date: 2016-03-20 04:09:10


Sent request to https://github.com/uBLAS/ublas/tree/feature/ublas00004_simd_gemm

Why doesn't it gets sent to the mailing list like the one about the week equality check?

--------------------------------------------
On Mon, 14/3/16, Nasos Iliopoulos <nasos_i_at_[hidden]> wrote:

 Subject: Re: [ublas] [PATCH 3/3] boost::ublas increasing the range of BLAS level 3 benchmarks
 To: ublas_at_[hidden]
 Date: Monday, 14 March, 2016, 16:12
 
 
     Only in exceptional cases we  make pull requests or
 changes in the
     master . Master only merges off develop ( that in turn
 merges off
     feature/bug branches). So the
     https://github.com/uBLAS/ublas/tree/feature/ublas00004_simd_gemm
 is
     the correct branch to request a pull.
 
     
 
     Pull requests go to https://github.com/uBLAS/ublas
 and NOT
     https://github.com/boostorg/ublas).
 I see the pull request in the
     boostorg repo, so please perform it in the ublas repo. I
 need to
     clarify this in the wiki because probably it is not very
 obvious.
 
     
 
     -Nasos
 
     
 
     
 
     
 
     On 03/13/2016 02:26 PM,
 palik imre
       wrote:
 
     
     
       
         A bit of confusion here.
         
 
         
         I created
 a
           fork of the feature branch you sent, as I
 didn't have the
           rights to push there.  Then I sent a pull request
 for that.
         
 
         
         Should I
           fork the master instead?
         
 
         
         
 
         
         Thanks,
         
 
         
         Imre
 
         
         
         
 
           
 
         
         
           
             
                On
 Sunday, 13
                   March 2016, 19:03, palik imre
                   <imre_palik_at_[hidden]>
 wrote:
 
                 
               
 
               
 
               
                 
                   
                     
                       Results
                         for low dimmension.  More data
 would exceed
                         mailing list limits:
                       
 
                       
                       # 
 
                         m original: t1   MFLOPS original:
 t1    MFLOPS  
                         Diff nrm3  gemm:   t2   
 MFLOPS   Diff nrm4
                         mixed:   t2    MFLOPS   Diff
 nrm5
 
                         Â  1   2.1263e-07   9.40601 
 1.32802e-07    
                         15.06           0
 1.36006e-07  
                         14.7052           0
 6.31318e-07  
                         3.16798           0
 
                         Â  2  2.28189e-07   70.1173 
 1.37767e-07  
                         116.138           0
 1.59801e-07  
                         100.125           0  
 6.653e-07  
                         24.0493           0
 
                         Â  3    2.649e-07   203.851  
 1.5541e-07  
                         347.468           0
 1.54267e-07  
                         350.042           0
 6.98766e-07  
                         77.2791           0
 
                         Â  4  3.35269e-07   381.783  
 2.4183e-07  
                         529.297           0
 2.12891e-07  
                         601.247           0
 6.65688e-07  
                         192.282           0
 
                         Â  5  3.53868e-07   706.478 
 2.30977e-07  
                         1082.36           0
 2.53215e-07  
                         987.303           0
 7.07933e-07  
                         353.141           0
 
                         Â  6   4.2987e-07   1004.95 
 2.59713e-07  
                         1663.37           0
 2.54867e-07     
                         1695           0
 8.17448e-07   528.474          
                         0
 
                         Â  7  5.39621e-07   1271.26 
 4.51043e-07  
                         1520.92 7.98975e-09 5.12948e-07  
 1337.37
                         7.98975e-09 8.76363e-07  
 782.781           0
 
                         Â  8  6.18993e-07    1654.3 
 6.38988e-07  
                         1602.53 4.17673e-09 6.37931e-07  
 1605.19
                         4.17673e-09 8.92556e-07  
 1147.27           0
 
                         Â  9  7.73683e-07   1884.49 
 7.26336e-07  
                         2007.34 3.30697e-09 8.00656e-07  
 1821.01
                         3.30697e-09 1.09762e-06  
 1328.33           0
 
                         Â 10  9.27569e-07   2156.17 
 8.31827e-07  
                         2404.35 1.94317e-09 8.72131e-07  
 2293.23
                         1.94317e-09 1.16572e-06  
 1715.68           0
 
                         Â 11  1.13882e-06    2337.5 
 1.03275e-06  
                         2577.58 1.27501e-09 1.08775e-06  
 2447.25
                         1.27501e-09 1.16439e-06  
 2286.17           0
 
                         Â 12  1.26427e-06   2733.59 
 1.40013e-06  
                         2468.34 8.50076e-10 1.39562e-06  
 2476.32
                         8.50076e-10 1.01202e-06  
 3414.97           0
 
                         Â 13   1.5751e-06   2789.66 
 1.64811e-06  
                         2666.09 5.39864e-10 1.66862e-06  
 2633.32
                         5.39864e-10 1.61517e-06  
 2720.45           0
 
                         Â 14  1.79595e-06   3055.77 
 1.89937e-06  
                         2889.37 4.08632e-10  1.6485e-06  
                         3329.09           0
 1.65016e-06  
                         3325.73           0
 
                         Â 15  2.14056e-06   3153.37 
 2.24248e-06  
                         3010.06 2.73316e-10  1.6875e-06  
                         3999.99           0
 1.80164e-06  
                         3746.59           0
 
                         Â 16  2.38996e-06   3427.67 
 2.63386e-06  
                         3110.27 2.30152e-10 1.74627e-06  
                         4691.14           0
 1.91648e-06  
                         4274.49           0
 
                         Â 17  2.93315e-06   3349.98 
 3.08031e-06  
                         3189.94 1.85538e-10 2.17697e-06  
                         4513.62           0
 2.13505e-06  
                         4602.23           0
 
                         Â 18   3.3771e-06   3453.85 
 3.23863e-06  
                         3601.52 1.20251e-10 2.23225e-06  
                         5225.23           0
 2.36877e-06  
                         4924.07           0
 
                         Â 19  4.19699e-06   3268.53 
 4.02621e-06  
                         3407.17 1.07796e-10
 2.29651e-06   
                         5973.4           0
 2.44714e-06  
                         5605.72           0
 
                         Â 20  4.27777e-06   3740.27 
 4.86115e-06   
                         3291.4 8.37665e-11 2.26798e-06  
                         7054.74           0
 2.44016e-06  
                         6556.96           0
 
                         Â 21  5.58038e-06   3319.13 
 5.51606e-06  
                         3357.83 5.93714e-11 2.61705e-06  
                         7077.43           0
 2.90197e-06  
                         6382.56           0
 
                         Â 22  5.46208e-06   3898.88 
 5.50258e-06  
                         3870.19 5.76987e-11 2.85448e-06  
                         7460.56           0
 3.09923e-06  
                         6871.39           0
 
                         Â 23  7.26813e-06   3348.04 
 6.48407e-06  
                         3752.89 4.47169e-11 3.03986e-06  
                         8004.98           0
 3.16566e-06  
                         7686.86           0
 
                         Â 24  6.56421e-06   4211.93 
 7.20581e-06   
                         3836.9 3.61275e-11 2.84288e-06  
                         9725.35           0
 2.81577e-06  
                         9818.99           0
 
                         Â 25  7.97135e-06   3920.29 
 7.80654e-06  
                         4003.06 3.02957e-11 4.04575e-06  
                         7724.16           0
 4.15001e-06  
                         7530.11           0
 
                         Â 26  8.59272e-06    4090.9 
 8.46934e-06   
                         4150.5 2.53217e-11  4.1795e-06  
                         8410.58           0
 4.36958e-06  
                         8044.71           0
 
                         Â 27  1.05527e-05   3730.41 
 9.66865e-06  
                         4071.51 1.97479e-11 4.24268e-06  
                         9278.57           0
 4.64476e-06  
                         8475.37           0
 
                         Â 28  9.77679e-06   4490.63  
 1.0918e-05  
                         4021.26 1.71505e-11 4.41728e-06  
                         9939.14           0
 4.55165e-06  
                         9645.73           0
 
                         Â 29  1.23574e-05   3947.28 
 1.15308e-05  
                         4230.22 1.54399e-11 4.96383e-06  
                         9826.69           0
 5.27042e-06  
                         9255.05           0
 
                         Â 30  1.25312e-05   4309.24 
 1.23192e-05   
                         4383.4 1.38837e-11 5.36616e-06  
                         10063.1           0
 5.57707e-06  
                         9682.51           0
 
                         Â 31  1.41019e-05   4225.11 
 1.41554e-05  
                         4209.15 1.12822e-11 5.56749e-06  
                         10701.8           0
 5.87983e-06  
                         10133.3           0
 
                         Â 32  1.44935e-05   4521.76 
 1.74419e-05  
                         3757.38  9.5502e-12 5.91291e-06  
                         11083.5           0
 6.07622e-06  
                         10785.7           0
 
                         Â 33  1.68922e-05   4254.86 
 1.62224e-05  
                         4430.55 8.00562e-12 6.51645e-06  
                         11029.6           0
 6.62821e-06  
                         10843.7           0
 
                         Â 34  1.73001e-05    4543.8 
 1.68924e-05  
                         4653.46 7.54927e-12 6.83433e-06  
                         11501.9           0
 6.95343e-06  
                         11304.9           0
 
                         Â 35  2.07166e-05    4139.2 
 2.15962e-05  
                         3970.61 6.52939e-12 7.06462e-06  
                         12137.9           0
 7.53811e-06  
                         11375.5           0
 
                         Â 36  1.98326e-05   4704.97 
 2.13473e-05  
                         4371.14 5.68874e-12  
 6.703e-06  
                         13920.9           0
 6.99365e-06  
                         13342.4           0
 
                         Â 37   2.3838e-05   4249.78 
 2.23655e-05  
                         4529.56 5.11318e-12 8.87253e-06  
                         11417.9           0
 9.13862e-06  
                         11085.5           0
 
                         Â 38  2.35903e-05   4652.09 
 2.48122e-05  
                         4422.99 4.71306e-12
 9.24238e-06    
                         11874           0
 9.27922e-06  
                         11826.9           0
 
                         Â 39  2.79913e-05   4238.39 
 2.64576e-05  
                         4484.09 4.20714e-12 9.68511e-06  
                         12249.5           0
 9.95689e-06  
                         11915.2           0
 
                         Â 40  2.60131e-05    4920.6  
 2.9098e-05  
                         4398.93 3.42002e-12 9.80308e-06  
                         13057.1           0
 1.04198e-05  
                         12284.3           0
 
                         Â 41  3.13419e-05   4398.01 
 3.03942e-05  
                         4535.14 3.13757e-12 1.07587e-05  
                         12812.2           0
 1.10016e-05  
                         12529.3           0
 
                         Â 42  3.10015e-05   4779.64 
 3.20343e-05  
                         4625.54 2.91245e-12 1.09989e-05  
                         13471.9           0
 1.16031e-05  
                         12770.4           0
 
                         Â 43   3.6527e-05   4353.33 
 3.49908e-05  
                         4544.46 2.71446e-12 1.13164e-05  
                         14051.7           0
 1.20516e-05  
                         13194.4           0
 
                         Â 44  3.36654e-05   5060.62 
 3.86435e-05  
                         4408.71 2.49076e-12 1.16151e-05  
                         14667.8           0
 1.21377e-05  
                         14036.2           0
 
                         Â 45  3.95282e-05   4610.63 
 3.98562e-05  
                         4572.69 2.12037e-12 1.26784e-05  
                         14374.8           0
 1.32723e-05  
                         13731.6           0
 
                         Â 46  3.96351e-05    4911.6 
 4.17105e-05  
                         4667.22 1.96734e-12 1.27302e-05  
                         15292.2           0
 1.34346e-05  
                         14490.3           0
 
                         Â 47  4.63424e-05   4480.69 
 4.50811e-05  
                         4606.05 1.77515e-12 1.33133e-05  
                         15596.9           0
 1.39354e-05  
                         14900.7           0
 
                         Â 48  4.31748e-05   5122.99  
 5.0325e-05  
                         4395.11 1.75073e-12 1.32491e-05  
                         16694.3           0 
 1.3501e-05  
                         16382.8           0
 
                         Â 49  4.93001e-05   4772.77 
 5.11402e-05  
                         4601.03 1.48788e-12  1.6222e-05  
                         14504.9           0
 1.72531e-05    
                         13638           0
 
                         
 
                       
                       
 
                       
                       First
                         group is legacy axpy_prod(), second
 group is
                         legacy prod(), third group is legacy
 prod for
                         low dimensions, and gemm() for high
 dimmension.
                         fourth group is gemm().
                       
 
                       
                       As
 
                         the legacy version is expression
 template based,
                         it can possibly provide some further
 advantages
                         when the operations are
 chained.
                       
 
                       
                       I
                         put some defines in place, that
 would make
                         possible to force the legacy version
 as the
                         default, as opposed to the runtime
 switched
                         version.
 
                       
                       
 
                       
                       
                         
                           Imre
                         
                       
                       
 
                         
 
                       
                       
                         
                           
                             
                                On Friday, 11
                                   March 2016, 14:21, Nasos
 Iliopoulos
                                   <nasos_i_at_[hidden]>
 wrote:
 
                                 
                               
 
                               
 
                               
                                 
                                   
                                     Regardless, these are
 great figures.
 
                                     
 
                                     Can you please run them
 comparing
                                     the simple uBlas
 implementation for
                                     matrices from 2 to 100
 with the gemm
                                     based one with a signle
 thread? I
                                     wonder when the control
 statement
                                     starts to play a
 role.
 
                                     
 
                                     What do you think should
 be the plan
                                     to switch from
 multi-core to to
                                     single-threaded so as to
 not get all
                                     the communication hit
 for smaller
                                     matrices?
 
                                     
 
                                     
 
                                     - Nasos
 
                                     
 
                                   
                                 
                                 
 
                               
                             
                           
                            
                         
                          
                       
                     
                   
                 
                 
 
                 _______________________________________________
 
                   ublas mailing list
 
                   ublas_at_[hidden]
 
                   http://lists.boost.org/mailman/listinfo.cgi/ublas
 
                   Sent to: imre_palik_at_[hidden]
                 
 
                 
 
               
             
           
         
       
       
 
       
       
 
       _______________________________________________
 ublas mailing list
 ublas_at_[hidden]
 http://lists.boost.org/mailman/listinfo.cgi/ublas
 Sent to: athanasios.iliopoulos.ctr.gr_at_[hidden]
     
     
 
   
 -----Inline Attachment Follows-----
 
 _______________________________________________
 ublas mailing list
 ublas_at_[hidden]
 http://lists.boost.org/mailman/listinfo.cgi/ublas
 Sent to: imre_palik_at_[hidden]