W.r.t. the interfaces, I think we could move the uplo-tags detection to the
free functions, enabling us to offer both interfaces, with and without the
uplo/diag arguments. I think we could end up with low-level drivers that
support both type-decorated matrices and true-triangular matrices. I.e.,
something like

All sounds great to me.  Until this and the triangular matrix traits are ready, can we revert back to the older interface for trtri so I can still code with it?