first of all I don’t want to take wrong credits and want to point out that this is not my algorithm. It is based on http://www.cs.utexas.edu/users/flame/pubs/blis2_toms_rev3.pdf
https://github.com/flame/blis