Boost logo

Boost :

From: Alfredo Correa (alfredo.correa_at_[hidden])
Date: 2024-09-25 08:55:37


 Hi Andrzej,

On Tue, Sep 24, 2024 at 11:32 PM Andrzej Krzemienski <akrzemi1_at_[hidden]>
wrote:

>
>
> śr., 25 wrz 2024 o 04:27 Alfredo Correa via Boost <boost_at_[hidden]>
> napisał(a):
>
>>
>> > > Second, Multi is not a numerical library specifically; it is about the
>> > logic and semantics of multidimensional arrays and containers,
>> regardless
>> > of the element type.
>> > > ...
>> >
>> >
>> > See, this is exactly the problem. Why would I need something like that
>> > if I need to go to all the 3rd party libraries to actually use one
>> > efficiently?
>> >
>>
>> The same reason some of use the standard library containers, or ranges,
>> etc, even if they don't "do everything".
>>
>
> So, Artyom says that storage and element access alone is insufficient to
> warrant the existence of a library. Alfredo says something opposite.
>

If that is the point of the discussion and I didn’t realize it, that would
be a very fair point from Artyom.

Alfredo, what would help here is if you demonstrated that your library has
> users, and have the users say why they chose it, given that they have to
> get the algorithms from elsewhere. Maybe you are such a user?
>

I don’t think it would help, but I want to be transparent and it is a good
timing to do an assessment:

Project users (list in the docs)

https://github.com/QMCPACK/qmcpack (294 stars, 137 forks, 36 watch)

https://github.com/llnl/inq (23 stars, 4 forks, 10 watch; 22 stars, 16
forks in gitlab)

(disclaimer I am or I was involved in these projects above)

another AFQMC (auxiliary field quantum monte carlo simulation code) at the
Flatiron institute that is still a private repository, soon to be open
source according to the authors. (not involved).

other users:

6 stars, 2 watching in github repo
13 stars, 5 forks in gitlab repo
about 100 issues by associated developers and
I tracked about 2 or 3 issues openers that work at independent groups
(quite advanced users if I have to tell).

cpplang Slack #boost-multi channel: 18 members.

(sorry if there is double counting, this is the best I can do)

Most of the praise, from colleagues at least, is about the flexibility with
allocations.
In particular to separate allocations from array lifetime (i.e. memory
pools) which solves 50% of the problems of value semantics with big arrays,
especially in the GPU.
Another was about how straightforward was to incrementally rewrite old code.

>
> The rest of my post is "academic", as I do not have experience in the
> field.
>
> Having only the storage and access abstraction would be preferred over a
> framework, if for your particular use case you have to employ two domains,
> like image-processing and generic AI/ML, and you want two sets of
> algorithms applied to the very same data. Then there may be no single
> framework that satisfies your need, and you may need to make two frameworks
> interoperate.
>

Yep, that is another problematic aspect of frameworks. In general the
combinatorial explosion of tools necessary to make

> Next, the analogy to STL alone is not good enough, I think. It is on you
> to demonstrate that the idea of generic programming also applies to *real
> life* usages of big multidimensional arrays. STL itself has been criticised
> that because it is generic, it cannot be optimized for particular types. (
> https://www.youtube.com/watch?v=FJJTYQYB1JQ&ab_channel=CppCon)
>

Fair enough, the STL is opt-in, internal algorithms may be forwarded to
STL, so far I didn’t have the need to use other libraries, except for GPUs.
But I am open to discussing this more and improve if necessary.

STL is not perfect. in fact I found a couple of obvious problem with how
traits are used in STL and that maybe the library doesn’t *optimally* fit
with some concepts in the STL regarding iterator categories). The main pain
points, if you are curious, is that std::copy_n is not usable with
broadcasted arrays (I opened a GCC lib bug with A. O’Dwyer regarding this)
and that std::sort makes some overgeneralized assumptions about a trade off
between copy (which allocates and can throw) plus n moves versus n swaps.

Incidentally, this is related to the video link you sent, this is exactly
the point I found, I believe that 1) multi iterators are iterators that can
fall into a new category of iterators, for which I don't have a name yet,
perhaps between random_iterator and bidirectional to handle broadcasted
arrays. 2) std::sort (and other STL algorithms) is not customized enough,
at least in the place I found where a "rotation-by-1" is implemented as a
copy to a temporary and n-copies instead of n-swaps, even if n-swaps would
be better for some elements types and "row" sizes (and be noexcept) (this
is the infinite customization that Andrei talks about at
https://youtu.be/FJJTYQYB1JQ?t=4475, thank you for reminding me of this
talk).

It would be fun, if not academic, exercise to see if I can apply Andrei's
super-duper sort to array rows.
I might discover something new as well.
Notice that even though Andrei criticizes STL, he at least still uses
iterators, which is enough for me.
I would welcome any introspection mechanism that he would need for this.

(There are other aspects that make the library useful, such as the
flattening of arbritrary subarrays which effectively does a kind of
loop-fusion and makes the use of STL even more attractive.)

> In the context of big multi-dimensional arrays, we are talking about heavy
> computations. And maybe the data structures not optimized for specific use
> cases are simply disqualified from the outset.
>

yes

> Please, treat it as a hint on how to communicate your ideas to people in
> this forum, in order to convince them.
>

Absolutely, I appreciate it.

> Regards,
> &rzej;
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk