Boost :

Date view	Thread view	Subject view	Author view

From: Artyom Beilis (artyom.beilis_at_[hidden])
Date: 2024-09-25 08:26:15

Next message: Alfredo Correa: "Re: [Multi] Proposal"
Previous message: Andrzej Krzemienski: "Re: [Multi] Proposal"
In reply to: Alfredo Correa: "Re: [Multi] Proposal"

> Again, this boils down to compare apples with oranges, frameworks vs components.
>
> I don't have experience with OpenCV, I have seen code using with OpenCV, it seems it has a lot of features, you can load images, render windows, and do some array computations.

OpenCV itself is divided in many components, core, image io,
processing, and many more. Even header only parts suitable for pure
SIMD handling

>>
>> But if you don't provide algorithms, maybe I'd better take a
>> library/framework that does.
>
>
> Exactly, that is the appeal of frameworks.
> Frameworks are great if they do all that you need and no less.
>
> The moment you need to do something that the framework has not contemplated you are totally on your own, without much help.
>

There is nothing wrong with having a good component. That can interface
with existing systems like cblas, opencv Mat (like np.ndarray
cooperate with cv2)
and so on. Especially if you can provide "better" and more suitabe interface
to a somewhat aged cv2.

For example Boost.Locale wraps horrible ICU API.

BUT - it looks it neither creates a useful framework or useful core
library _because_ it does not provide basic functionality expected
for a library with numeric processing use in mind.

>> There are plenty of numpy-like arrays around. Usually they are called tensors...
>
>
> I agree, and almost all of them are frameworks.
>
> I don't think anything works with existing STL algorithms, iterators.
> If you are not interested in this, this is not for you.

Ok, very good point. Many/most stl algorithms aren't that suitable
for numeric computations. You wouldn't store vectors in std::list
and run addition on them (even if you can) because it would perform
horribly

More about it below.

>> >> I did breef look into implementation and it seems to be lacking of
>> >> vectorization (use of intrinsics/vector operators) or have I missed?
>> >
>> >
>> > You missed that this is a generic, not specifically numerical, library.
>>
>>
>> But, you making numpy-like library...
>
>
> You keep bringing up numpy.
> Some users of the library see the analogy with numpy because of the easy of use, and I appreciate that but I don't mention numerics or numpy in the documentation, except in one or two places with very clear context.
>
> If, for you, numpy implies numerics, then you are using the wrong analogy and point of comparison.

90% of your cases are related to numeric computations - and for a VERY
good reason!
So while it can be nice to have std::string Tensor (also not sure what for)
I don't see actual need for this outside numeric computations.
(But I may be wrong)

>>
>> otherwise you wouldn't be
>> interfacing cblas.
>
>
> Would you feel better about it if I remove the BLAS adaptor?
> This is a completely optional component.
>
> This is to make the library more immediately useful without converting it into a framework.
> The interface with BLAS is very strictly to be have to use BLAS through the Multi facilities in a functional (immutable) context that is friendly, for example, to STL algorithms that take lambdas.

1st you can use lambdas in tensor context - it is actually done quite
extensively
in libraries like pytorch - but it is done a little bit differently.

You run something like that (pseudo code)

    run_algo_parallen(a,b,[](a_section, b_section, range) {
        for(i : range) {
          b_section[i] += a_section[i] *2
        }
    }

It allows both run in range, use simd if possible and so on, even parallelize
It is quite different approach for use in numerical field.

I myself need to run lots of generic algorithms on MD tensors with genric range
in dlprimitives and pytorch opencl backend I work on.

So something like broadcasting for CPU with dynamic number of ranges so I can
run a generic lambda would be HIGHLY useful - but you need an interface
that supports it.

>> you direct it to the numeric computations.
>
> I don't direct it to the numerical computation; I clearly say in the introduction that this is not a numerical library.

Once again the main use case is numeric, vast majority of samples are numeric
integrations are numeric. And one example with strngs.

If it walks like a duck, quacks like a duck...

It is my observation and I think it is quite reasonable one especially since
it is main use case of strided/md arrays around.

> SIMD parallelization is more difficult to apply in general because it depends on data to be continuous in at least one dimension, which is only realized by very specific layouts
> (this is the reason BLAS matrices always have at least one stride = 1).
> This is not the general case, when you manipulate arrays dynamically.
> But I agree it still can be applied with some effort.

There is a VERY good reason that blas and actualy virtually any
library is optimized
for at least one stride=1... Otherwise you'll destroy your cache.
There is a reason why matrices and even sparse matrices implemented as
contiguous arrays in memory

> - external Deps ?

OpenCV has wide range of libraries - some come with no deps some with some.
I use OpenCV compiled on Android - core, imgproc and imgio with very basic
dependencies - of course for imageio I do need libpng/libjpeg - but
this is an optional
component

> - Arbritary number of dims (e.g. 11 dimensions)
Yes
> - Non-owning view of data (e.g. manipulate view memory provided by others)
Yes
> - Compile-time dim size

Don't think so

> - Array values (owning data) (e.g. can I put arrays inside a std::list?)

Not clear what do you mean? If you can put cv::Mat to std::list? Yes,
but generally
cv::Mat is reference counted.

> - Value semantic (Regular) (can I assign, swap, with expected Stepanov regularity results)

cv::Mat is reference counted for

> - Move semantics (e.g. will this copy data arr2 = std::move(arr1) ?)

Don't remember

> - const-propagation semantics (e.g. Mat const arr; ... is arr really read-only)

Don't think so

> - Element initialization (e.g. can the arrays be initialized at declaration e.g. Mat const arr = {1.0, 2.0, ...})

Depends: https://stackoverflow.com/questions/44965940/cvmat-initialization-using-array

> - References w/no-rebinding (e.g. can I name a subblock of an array, e.g. `auto sub = subblock of arr`? does it have reference semantics (no copy)?)

Yes

> - Element access (e.g. how easy is to access a speicif elements, e.g. in 4 dimensions `arr( 3, 4, 5, 6)`)
> - Partial element access, (e.g. take n-th column or n-th row of a 2D array)

If I understand your correctly yes to both.

> - Subarray views (e.g. generate a "view" 2D subblock of a 2D array)
yes

> - Subarray with lower dim (e.g. generate a "view" nD subblock of a mD array, where n < m).

yes

> - Subarray w/well def layout (e.g. access the layout of a subblock, if sunblocks can be referred to?)

Not sure I understand what do you mean.

> - Recursive subarray (e.g. can sunblocks "views" of subblocks "view" be temporaries)
If I understand you correctly - yes -

> - Custom Alloctors (e.g. thrust::device_allocator, boost::interprocess::allocator)

cv::Mat has custom allocators support -

> - PMR Alloctors (e.g. use std::pmr::monotonic_memory_resource)

Not sure what do you mean.

> - Fancy pointers / references (e.g. use memory not represented by raw pointers, e.g. thrust::device_pointer, boost::interprocess::offset_ptr)

Don't think so. But finally you have a pointer to a specific memory
even with boost::iterprocess - also pointer may differ between
processes. Not familiar with thrust.

> - Stride-based Layout (e.g. supports strides layout, element, and can gives this information to low level libraries)

cv::Mat has strided layout

> - Fortran-ordering (e.g. left-index is the fast index in memory)

= strided layout (i.e. tanspose)

> - Zig-zag / Tiles / Hilbert ordering / (e.g. fancy layouts beyond strides)

Not familiar with that so can't say.

> - Arbitrary layout (e.g. can data be laid out arbitrarily in memory in a user-defined way, not strides, not zig-zag)

Not sure it is even relevant for numerical processing... but don't think so.

> - Flattening of elements (e.g. any facilities to look at the elements in a flatted way beyond simply giving a pointer to the first element, which will not work for subblocks)

Not sure what do you mean.

> - Iterators (e.g. have the array, in any useful sense, .begin and .end?)

AFAIR Yes - also myself I rarely use them.

> - Multidimensional iterators (cursors) (e.g. auto h = arr_subarray.home(); h gives access to elements but is light as a pointer)

Not sure what do you mean.

> - STL algorithms or Ranges (e.g. would it work with `std::sort`, `std::copy`, `std::reduce`, `std::rotate`, or with any ranges algorithm)

Not sure I hadn't worked with it because for numerical data you
usually use specific
approaches that would be more efficient than generic algo.
So there are many standard algorithms there that are numerically aware
and more suitable

> - Compatibility with Boost (e.g. put arrays in Boost containers, use Boost Serialization, Boost interprocess, Boost algorithms)
> - Compatibility with Thrust or GPUs (e.g. can the array elements be in the gpu, and use the array through its interface without segfaulting, or use thrust::device_pointer memory)

Not clear what do you mean. just for the record there is cv::cuda and
cv opencl etc.

> - Used in production (e.g. major users or industries)

OpenCV is used almost everywhere in the industry

>> While it is nice to have an abstraction - if so either keep it basic
>> or go full way - you are stuck somewhere
>> in between std::vector ++ and something like OpenCV.
>
>
> This is a fair assessment, if you see resemblances with OpenCV is welcomed but accidental, since it is a library that I don't use.
> Sorry if you are disappointed this library doesn't do (directly at least) things that OpenCV does;
> the library definitely can do other things that OpenCV can't (and you are not interested in), and even if OpenCV can do them, it will do them with an interface that I is not within the scope of the goals of my library.
>

I don't expect it to do everything OpenCV does - since for things that
OpenCV does
we have OpenCV. I don't see a reason to reinvent the wheel, especially
that it works
very-very well.

However if you try to create a good muli-array interface make sure it improves
interfaces over existing libraries like OpenCV for example
and has interoperability and bring real value especially that main use case
is numerical computations.

I would like to see an interface that is useful in numerical context to
run generic computations.

Here is an example of a real use case I would love to see.

I have several dynamic arrays and I want to run an pointwise or reduction
operation over them in an efficient way automatically doing broadcasting
and or reduction... Here an example I use in my project for GPU
case:

https://github.com/artyom-beilis/pytorch_dlprim/blob/master/src/pointwise_ops.cpp#L1525

        dlprim::core::pointwise_operation({x,buf,dy},{dx},{},
                    R"xxx(
                    int is_negative = x0 < 0;
                    dtype maxd = is_negative ? 1.0f: 0.0f;
                    dtype s = is_negative ? 1.0f: -1.0f;
                    y0 = (maxd - s * (x1 / ((dtype)(1) + x1))) * x2;
                    )xxx",
                    getExecutionContext(self));

This is dynamically generated GPU code. But lets extend something like
that to following cases:

- input arrays should broadcast and have dynamic number of dimensions
- consider that maybe input values may be SIMD enabled when possible.

Consider improving this:
https://answers.opencv.org/question/22115/best-way-to-apply-a-function-to-each-element-of-mat/

Especially in case you don't know number of dimensions in advance.

Artyom

> I don't doubt this.
>
> I think it is good that both libraries are different, I don't need all the bells and whistles that OpenCV has and it is undeniably a heavy dependency.

It actually isn't. Depends on components you select. To be fair it is
way smaller
and easier to build and stable in comparison to Boost :-)

>> >> What is not clear to me is why should I use one over some existing solution
>> >> like OpenCV?
>> >
>> >
>> > - Because sometimes your elements are not numerical types.
>>
>>
>> Yeahhh... don't buy it. Sorry :-)
>
>
> yeah, but I do need a 100x100 array of std::strings. and a 20x10 array of std::tuples.
> I would love to know if OpenCV can store those.

And give me a real world use case for using strdided storage for std::strings...

>> > - Because sometimes you want to specify your element types as template parameters not as OpenCV encoded types.
>>

Fair. I would like to see something like float16 and bfloat16 in opencv. Or
another float format you select for example.

>>
>> To make your code compilation slower and more horrible? Actually
>> OpenCV supports templated accessors.
>
> This is the perennial discussion between header-only and pre-compiled libraries.

Having said all that, Boost took way too "header only path". I
remember like 10 years
ago when I used Boost.Asio in industrial project I need to put all network
objects to separate classes hidden behind pimpl to get reasonable
compilation times and still the networking part that was like less
that 1% of the code
took 50% of compilation time.

Actually nowadys Asio provides compilation as a separate library (also havn't
used Asio for a long time)

While header only template classes are required - algos should be in cpp
unless they are truly generic.

As a rule of thumb - if you can put it in cpp - put it there.

>>
>>
>> OpenCV support custom allocations (actually something I exactly use
>> right now to monitor memory consumption)
>
>
> That is great, can you point me to an example?
> From what I quickly see online, openCV gives its own allocators.
> I am not interested in non-standard allocators for this library.

Here my latest example I use to monitor memory use - it just
provided pass though to standard allocation but counts memory
used:

https://github.com/artyom-beilis/OpenLiveStacker/blob/main/src/allocator.cpp

So I can discard incoming data if I overload.

> Can it take allocators that do not return raw pointer types such as GPU?

See, GPU is a very different beast... For example you don't have pointer
arithmetics with OpenCL at host and you can create sub-buffers.
In OpenCL you don't really have pointers - you have memory regions you
don't even access from the host. Cuda Allows pointer arithmetics.

In some GPUs like Intel, AMD APU or Arm/Mali you can share memory between
CPU and GPU (because it is shared) but it works on memory regions not specific
pointers - so it is a different beast. No pointer arithmetics and
access requires
mapping/unmapping.

> and if not, what if you allocate a raw pointer with cudaMalloc, how much of the library can you still use?
>

You can't use CUDA points at host only pass them

>>
>> I'm not sure it is a great idea for huge arrays/matrices. Virtually
>> ALL tensor libraries have reference semantics for a very good reason.
>
>
> Sometimes you need it, sometimes you don't.
> If you don't need, don't make copies, don't pass by value, that is the right thing to do and allowing the option it is a priority of the library.

Ok... lets say it this way - in tensors, mat and other libraries -
default is reference
and copy is explicit - you can copy.

If you may it other way around like std::vector - you'll make the library
virtually useless or at least highly inconvenient for main use case
(numeric computations)

> The "raw" broadcast operation in the library is an operation that cannot fail.
> It is how the user of the broadcasted object uses it what can create problems down the road if he doesn't know what he/she is doing.
> The user may choose algorithms that throw when sizes are incompatible, or just assert or just continue.
> Who am I to choose for them?

Ok... that just makes it less useful - because matrix/tensor/ndarray
dimension is something dynamic and not known in the run time.

> I think you don't agree with the scope of the library, which would be fine.

I think that in its current scope the library isn't that useful for primary use
case - that is my point.

Best Regards,
Artyom

Next message: Alfredo Correa: "Re: [Multi] Proposal"
Previous message: Andrzej Krzemienski: "Re: [Multi] Proposal"
In reply to: Alfredo Correa: "Re: [Multi] Proposal"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk