[boost] Re: [multi] Formal Review Begins

7 Mar 2026

      ...
A couple good questions were posed today on Reddit:
A followup from both parties on Reddit:

The statement that there should be "no expected overhead" seems incorrect to me. Am I missing something?

Consider references to a dynamic 2 dimensional object, the sort of thing that gets copied around a lot.

```c++
using M = std::mdspan<double, std::extents<size_t, std::dynamic_extent, std::dynamic_extent>>;
using R = boost::multi::array_ref<double, 2>;
```

I measure:

```c++
sizeof(M) = 24
sizeof(R) = 72
M trivially copyable: true
R trivially copyable: false
```

You can confirm this here: https://godbolt.org/z/n95Ws9KW5

So there is overhead making it 3x bigger, but surely there will also be runtime overhead from copying them around, including from host to GPU, and probably more register pressure.

I think this example reflects the common case well. If the dimensions are known at compile time, the advantage of mdspan is greater. If the layout is strided, then the advantage is less. So dynamic and contiguous is the common situation, but also, an average example of the extra overhead.

edit: I measure the size of decltype(std::declval<R&>().begin())to be 64 bytes; I was thinking in some cases the iterator gets passed instead of the array_ref. A bit smaller but not by a lot.

And again Alfredo's Response:

0) These are good points but the original question was if there is a cost to pay for using typed-GPU-pointers instead of raw pointers, and the answer is still no.

1) The new question is about the size of the reference object. Yes, Multi's array-reference occupy more stack bytes than span, this is because they are more general and in principle they can hold padded data for example (which is going to be implemented in a next version). This extra sizes may not be reflected because reference-array are never in the heap and the compiler is able to optimize a lot in these structures. (the mdspan shouldn't be in heap also IMO, but I digress).
Yes, it can bring extra bytes across compilation units, AFAIK, or yes when passing to GPU kernels (which I think is your point), but then the question do really want to pass reference-arrays to kernels. My opinion is not, you "pass" array in a different way, which is documented. array_ref's are not copy constructible so it won't work even if you try, (well, there is a hack but I don't recommend it). In summary, array-references live in the stack and can be heavily optimized, array-references are not meant to be passed as kernel arguments.

2) array-references are not copy constructible, this is by design to keep value and reference semantics clearly separated. So, it is not trivially-copy-constructible simply because it is not copy-constructible, not because it does something strange. And of course array-references are not trivially assignable, this is because assignment is deep (actual code needs to be executed), not shallow like the reseating of span or mdspan. This is again to maintain the separation between values and references. This properties and are documented.

[boost] Re: [multi] Formal Review Begins

Matt Borland