Boost logo

Boost :

From: Alfredo Correa (alfredo.correa_at_[hidden])
Date: 2024-09-16 20:07:32


Hi Andrzej, (and all),

Thank you for taking the time to write your answer and the quick
first-impressions

On Mon, Sep 16, 2024 at 7:17 AM Andrzej Krzemienski <akrzemi1_at_[hidden]>
wrote:

>
>
> śr., 26 kwi 2023 o 23:38 Alfredo Correa via Boost <boost_at_[hidden]>
> napisał(a):
>
>>
>> The library is available here,
>> https://gitlab.com/correaa/boost-multi.
>>
>> Hi Alfredo,
> Thank you for sharing your library. This has been more than a year now,
> and I am sorry for the delayed response. Thank you for reminding us of it
> in the slack channel. From this, I gather that the game is still afoot.
>

yes, it is.
No problem about the delayed response.

> I personally never needed to manipulate big multidimensional arrays, so I
> cannot immediately appreciate the usefulness of the library. I need a good
> introductory part. When I read the high-level description, I immediately
> think, "it is the same as std::mdspan". The docs say that it is different
> from the std::mdspan, but then I think, "no, it is the same as std::mdspan".
>

In my experience manipulating (big) multidimensional arrays boils down to 3
things:

1) manage allocations carefully,
2) resolve the tension between 1D access in a n-dimensional space. (Handle
logic access but also fuse loops when performance demands it.)
3) good separation between value and reference semantics to avoid
unnecessary copies when possible and ensure true value semantics when
needed. Well-defined semantics in generic settings avoid the need for
"defensive" copies.

None of this is directly tackled by std::mdspan.
std::mdarray is newer, and I didn't have time to experiment with it.
My understanding is that mdarray doesn't tackle these problems either, only
1) partially since it is going to be a container-adaptor (it will rely on
an underlying container).

> From the comparison table, I gather that Multi offers both the container
> and the views (sort of references), and that std::mdspan is only a view. Am
> I right?
>

yes

> The docs say that Multi provides value semantics, but I guess it is not a
> fair statement.
>

Multi provides value semantics that no other library provided so far IMO,
and that is a fair statement.

> I guess (and correct me if I got this wrong) that the container is
> value-semantic, but the views are not.
>
That is the nature of the views, that you want them *not* to have value
> semantics.
>

I don't like to use the term "views" because it can mean many things,
especially because the ranges and spans abuse the term.
It means so many things that even the term "owning" views are used now,
which is opposite to your definitions ("the nature of the views, that you
want them *not* to have value semantics").
In my opinion, the current use of "view" has no well-defined reference
semantics, no well-defined consistency propagation, and no well-defined
lifetime. Individual elements can return anything, basically, l-values,
e-values, or proxies.
It seems that "view" nowadays means anything that is not strictly
container-value but is related to it.
For this reason, I stopped using the term view for my library.
I tend to use more term subarrays and reference objects (not necessarily
language references).

> A fair comparison, should compare Multi's views to std::mdspan.
>

I touch on this on the section "Substitutability with standard vector and
span".
With respect to semantics mdspan is the same as span.
In a few words "Multi's views" are proper references (as much as the
language allows) and std::mdspan is a mix of things (that is lately
accepted as good enough under the "view" wording).

Now, the Standard library has also a proposal in flight to add a container
> for multi-dimensional arrays: std::mdarray:
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1684r2.html
> Could you also include it in your comparison? And I would expect that
> std::mdspan and std::mdarray are treated as one in this case.
>

Yes, I could to that,
I can add mdarray in the same column as mdspan, and bump its specified
requirement to C++26. (Multi is C++17)
Take into mdarray is very recent I only had access to an experimental
implementation of it.

> I read that Multi's types have an STL-compatible interface (range).
>

yes, because it provides a iterators begin() and end() that are random
access that access the multidimensional structure.
The access can be recursive (A2D.begin(), A2D.end(), A2.begin()->begin(),
etc) or flattened (A.elements().begin(), A.elements().end()).

I will add this explicitly early in the documentation.

> But this is far from obvious what it means in the context of
> multi-dimensional arrays. The range/iterator interface was tailored for
> one-dimensional data structures. There is no obvious generalization to
> multiple dimensions.
>

There are two "canonical" generalizations to multiple dimensions, the
library handles both is two different clear ways, recursive and flattened.

The first consists in regarding a multidimensional as *nested* 1D ranges,
where the order of nesting corresponds to the indices ordering.

In this view given a multidimensional object A (dimensional larger than
one),
The A[0], A[1], A[2],... is a one dimensional sequence of ranges of lower
dimension than A.
If done right, all algorithms that work on 1D ranges should work on the
range A.begin()... A.end().

If you write a function that is agnostic of the ultimate (true) dimension
of A, you are writing dimension-generic code.

The second is to see the whole multidimensional object as a 1D range of all
the "terminal" (zero-dimensional elements), that is an unravelled version
of the array.

Both generalizations are useful, one is accessed through indices, or
iterators, A[i] and A.begin(), A.end().
The interesting thing is that A itself is regarded as a 1D object for
algorithms that expect that. For example std::ranges algorithms.
The other generalization is accessed through the .elements() member.
A.elements() gives all the elements across all dimensions are a linear
range.

> I wouldn't even expect a multi-dimensional array to give me an STL
> interface (whatever that means).
>

And yet it does.
You know what it means now.

Imagine it, you have a multidimensional object and all the algorithms of
STL and std::ranges and (if you wrote your generic functions carefully) all
your functions that deal with 1D random access containers would work!

Maybe, you mean that the library offers a view where you can see the entire
> multidimensional array as a long string of values? This would make sense,
> but if it is the case, I expect the introduction to say exactly this.
>

Presenting the multidimensional array as a long string is something that
fundamentally breaks the abstraction of the multidimensional object, so I
delayed referencing to it.

It is mentioned in the "comparison table" in the row "flattening of arrays".

I am going to add this distinction more prominently.

> In the case of std::mdspan, it has been said that it has been tailored to
> efficiently represent both huge datasets as well as tiny 4x4 matrices. I am
> not sure if this is the case, but I request that the docs for Multi say
> what use case they have been designed and optimized for.
>

In this sense it is designed as std::vector, it is optimized for the
large-n case number.
It is not optimized (amortized) for insertions or push_backs because of the
nature of multidimensional arrays and space and time efficiency constrains
and to maintain symmetry among subdimensions.

Except for the fact that it wasn't programmed for compile-time dimensions
(like mdspan was),
the small-n case shouldn't be bad either.
Also, there is no small-array optimization.
Since the library is very good at interfacing with allocators, it gives the
option for stack-based allocator for small array.

The other optimized case is on the dimensionality, in the sense that it is
generic.
Dimensionality is handled recursively.

> Does the library only represent dense matrices, or can it also represent
> sparse data?
>

Who talked about "matrices"? :)
(yes, I mistakenly wrote it once in the documentation)

The point is that the term 'matrices' (and 'tensors') carry semantic
meaning, such as algebraic operations, related to liner algebra (and
geometry).
If someone wants to implement matrices using Multi they are welcomed, and
of course, as you said, using multi::array<Scalar, 2> from Multi is an
obvious candidate to implement dense matrices.

>From the intro paragraph:
"The library's primary concern is with the storage and logic structure of
data; it doesn't make algebraic or geometric assumptions about the arrays
and their elements. In this sense, it is instead a building block to
implement algorithms to represent mathematical operations, specifically on
numeric data. Although most of the examples use numeric elements for
conciseness, the library is designed to hold general types (e.g.
non-numeric, non-trivial types, like std::string, other containers or, in
general, user-defined value-types.)"

> The term "stride-based". It is not clear to me what it means.
>

It referees to the main data structure layout that the library supports.
Ultimately, it says that the data of any Multi object is arranged as base +
i1*stride1 + i2*stride2 + i3*stride3 + ...

mdspan/mdarray gives (completely?) general layouts, but at the price of
no-iterators, and fewer complexity guarantees.

If there is a better name for it, please let me know.

> I cannot see from the introduction if this library will throw exceptions.
>

It does not.
It embraces the basic exception guarantee in general.

Operations that allocate may throw exceptions from the allocator, which is
provided by other libraries.

Logical errors when using the library result in UB or assertions when
possible.

I will make this clear in the documentation.

> The comparison between Multi and std::mdspan in the row "const-propagation
> semantics" is unfair. I guess you are comparing Multi's container with a
> view.
>

Not in particular, I am comparing all aspect of Multi.
All aspect of Multi should propagate constness (modulo bugs).

> A view is not expected to propagate constness.
>

(it seems that nothing is expected from "views" after all, so everything is
allowed)

Multi subarrays, and iterators propagate constness,
IMO in a way that ranges should have propagated constness.
I choose to propagate constness because it makes const useful.

> So this is the very initial feedback, I hope it helps.
>
>
It helps a lot actually, I appreciate your time and effort.
I will proceed to improve the documentation based on your comments.

Thank you,
Alfredo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk