Boost logo

Boost :

From: Alfredo Correa (alfredo.correa_at_[hidden])
Date: 2024-09-20 20:34:10


Hi Artyom,

Thank you for you for noticing the proposal and the feedback.

> It can be excellent to have something like standard MD arrays like numpy is
> today for python, but...
>

yes, this is one of the goals.

> 1. You haven't mentioned even once OpenCV which today is the de-facto
> standard for numerical computing in today's C++ world.

I do scientific computing with large arrays, and nobody uses OpenCV.
It would be fair if you had said that it is the de facto standard for image
processing, which I don't do.

Second, Multi is not a numerical library specifically; it is about the
logic and semantics of multidimensional arrays and containers, regardless
of the element type.
Having said that, the library should still be very performant with
numerical data, and for that, I delegate that responsibility to the
algorithms it uses behind the scenes, which are very customizable.
Unfortunately, people read "arrays" immediately jump to "numerics" and
"linear algebra".

>From the introduction "The library's primary concern is with the storage
and logic structure of data;
it doesn't make algebraic or geometric assumptions about the arrays and
their elements.
(It is still a good building block for implementing mathematical
algorithms, such as representing algebraic dense matrices in the 2D case.)
In this sense, it is instead a building block to implement algorithms to
represent mathematical operations, specifically on numeric data."

> While it isn't
> perfect - it actually has similar features in terms of array layout - and
> what is more important it has a library of high performance algorithms to
> process the data. You must talk at least about interoperability between
> cv::Mat and this library
>

Multi is not an algorithms library, but it carefully plays with existing
algorithms and facilitates as much as possible to interface with many
numerical libraries.

> 2. While actual ndarry handling is nice, it is basically a tip of an
> iceberg. You need a decent set of high performance algo combined with it.
>

There is a clear separation of concerns here.
The Multi library deals with the data structure, it uses when it needs to
fulfill its semantics, the best *existing* algorithms it can in awareness
of the datastructure.
The Multi library doesn't provide algorithms, it uses the algorithms that
are provided to it via different mechanisms.

>
> Looks like there is a basic BLAS but the typical library has way-way more
> to make it useful.
>

This is an add-on that is separate from the main library.
I cannot even guarantee that it will ship with the proposed version of the
library.

What do you mean by the "typical library" in "the typical library has
way-way more
to make it useful"?

I agree, promising all linear algebra is infinite work, like reimplenting
MATLAB or Mathematica, but BLAS has a finite number of functions.
The philosophy of the BLAS-adaptor in particular (again, an optional
component) is to interface to what BLAS offers and not more.
It is more of a model how to interface a legacy library using the features
of Multi.

> I looked over the code and didn't find include to cblas... Did I miss
> something? Also I see request to link cblas/openblas or so.
>

Yes, you missed that depending on cblas would tie it the application to
cblas, not all BLAS implementations can be used through cblas.
BLAS is uses through the ABI directly.
(basically I have my own version of the cblas header as an implementation
detail).
Also the BLAS interface is the same for cuBLAS, which is another story.

> I did breef look into implementation and it seems to be lacking of
> vectorization (use of intrinsics/vector operators) or have I missed?
>

You missed that this is a generic, not specifically numerical, library.
The other thing to take into account is that vectorization/parallelization
is still provided by the external algorithms the library uses internally.

For example, then dealing with GPU or OpenMP arrays, the library uses
thrust algorithms if they are available, which are parallel.

vectorization is use by passing execution policies, or whatever mechanism
the underlying algorithms use as long as they conform to a certain syntax.
(there are some commits coming regarding the passing of execution policies
Remember that I don't use many algorithms internally, only those that are
necessary to implement the semantics of the array types, assignment, move
assignment, construction and destruction.
The rest of the operations generate views mostly, so they don't involve
computation or algorithms.

For the rest, as a user, you have to use algorithms, and I can help if you
find that algorithms do not perform as you expect.

> 3. Finally I do want to see comparison with stuff like OpenCV, Eigen and
> even uBlas (I know it isn't good)
>

Multi is for multidimensional arrays, not specifically 2D numerical arrays
(i.e. matrices).
I would be interested in improving the code if someone claims that
operations in these libraries are better performant.
I wouldn't want the documentation to be a comparison with so many libraries
to which there is just very partial overlap.
Specially I don't like to compare with frameworks that lock you in into
using their own facilities.
I consider that Eigen, OpenCV, Kokkos, PETSc are frameworks.
Eigen is listed in one column comparing to other libraries.
https://gitlab.com/correaa/boost-multi#appendix-comparison-to-other-array-libraries-mdspan-boostmultiarray-etc
I don't fill confident adding an OpenCV column because I don't have
experience with it, but feel free to help me adding a library and answering
the points of each row in the comparison table.

4. If you do implement many high performance algorithms this library by any
> means shouldn't be header only.
>

I don't implement any high performance algorithms here, I just give the
possibility to interface with them and use them internally when needed to
fulfill the semantics of a container.

> What is not clear to me is why should I use one over some existing solution
> like OpenCV?
>

- Because sometimes your elements are not numerical types.
- Because sometimes you want to specify your element types as template
parameters not as OpenCV encoded types.
- Because sometimes you want arbitrary dimensionality (e.g. 2D, 3D, 6D) to
be compile-time.
- Because sometimes you want to apply generic algorithms to your arrays
(STL, std::sort, std::rotate, std::ranges, boost::algorithms, serialization)
- Because sometimes you want to implement function that are oblivious to
the actual dimension or your array (e.g. simultaneously view a 3D array of
elements as a 2D array of something, for abstraction).
- Because sometimes you want to control allocations, use fancy pointers,
parameterized memory strategies, polymorphic allocators,
- Because sometimes you want precise value semantics
- Because sometimes you want to give guarantees in the presence of
exceptions (the library doesn't throw exceptions, to be clear).
- Because sometimes you want to work with subblocks of an array in the same
way to need you work with the whole array.

Thanks,
Alfredo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk