Boost logo

Glas :

Re: [glas] introducing the storage concept, a first interface specification

From: Toon Knapen (toon.knapen_at_[hidden])
Date: 2005-01-18 07:46:15


Patrick Kowalzick wrote:
> Dear Toon,
>
> I hope I do not miss the point, but I do not really get the reason why
> to distinguish so many storage_types (heap/stack/disk/...). So I try to
> summarize what I understood (but as well regarding matrices and only
> dense cases):
>
> To make algorithms work, we need only a very slim interface for
> cointainers with constructors, element access and size functions. But to
> make the functions efficient and fast we need more information.
>
> IMO the most important points for the storage concepts are:
> 1.) How are the elements stored ?
> 2.) How is the data initialized ?
> 3.) Is the vector resizable ?
>
> to 1.)
> For an effective algorithm it is not necessary that the data is
> "globaly" consecutive. It might be enough that parts of the data are
> "localy" consecutive. E.g. two matrices which are stored in several
> vectors (rows or columns) can be added very fast iterating along the
> vectors but not across the vectors (but only due to the cache).

True. That was also (part of) the idea behind compound_storage. But
suppose your matrix uses several vectors to store its elements, you
still want to define _where_ these vectors are stored.

For maximum flexibility you might want to store them on the heap (this
would allow resizing, see later). But for performance reasons you might
want them on the stack. This is less flexible however because the size
is limited. When treating very large vectors that are too big to store
in memory, you will have to store (part of) your vector in a file.
_These_ kind of _optimizations_ are the most important idea behind the
storage.

> If the
> storage order (row_order or column_order) in both matrices is different
> the algorithm shall run along the vectors of the one and try to cache
> the corresponding vector of the other matrix (for our computer
> architectures). The problem here depends more on the row/colun_order
> than on "global" consecutive memory. Perhaps there might be a flag
> "force_caching" for disk/compund, but not an own concept?

The issues concerning row-major and column-major storage of a matrix are
indeed important. I have not yet brought this up to make the matters not
too complex from the first mail onward. But the issue is certainly
related. However a matrix being row-major or column-major only defines
_how_ its elements are _linearized_ into the storage. But before further
discussing this 'linearisation', we first need agreement _into_ _what_
they will be linearized into. And this is the storage.

>
> IMO the "Interaction with BLAS and LAPACK" is a special case (indeed
> very important) and does not only need consecutive_in_memory but also
> column_order for matrices.

you are right. On the other hand for some operations we can play with
the 'transpose flag' (or how should I call this character that is passed
on to BLAS) to achieve operations on row-major matrices. So it depends
on the algorithm.

>
> to 2.)
> If data is initialized (constructor or resize) or not, might be
> sensitive to performance (if I remember right the discussions we had in
> the uBLAS ML). If a goal of GLAS is to give the user the freedom to
> chose a cointainer (via traits/adapters?) there shall be an information
> about this or it must be defined what happens. Perhaps there might be a
> difference between the constructors and resize?

Thanks for bringing this up. I think this is an important issue.

First of all, we want to avoid that creating a container takes linear
time due to the one-by-one construction of all elements in the container
when the container is being constructed. This is generally wasted time
because default-initialising all elements in the container is generally
not what we want. Generally we will assign other values to all of the
elements in the container after construction.
Therefor the current storage concept has a 'size constructor' instead of
a 'default fill constructor' (as required by the Sequence concept for
instance). This 'size constructor' does _not_ require the elements to be
default-initialised after construction.

About resize now. I'm not sure if we need a resize function in our
storage. Currently I have thus not made a distinction between the 'size'
and the 'capacity'. This simplifies things for instance and maybe allows
better optimizations. Of course this is less flexible. So I decided not
to mention it yet and to bring it up for discussion later ... which is
now thus '-). Any arguments pro and contra are welcome.

> to 3.)
> IMO, only if there is an information if a vector is resizable there
> might exist a compile-time check if an addition between two
> not-resizable vectors is possible. I do not have an example (and it
> might not exist with expressions), but I can also imagine that some
> algorithms might need resize and can not always work with a constructor.

I think we should respect the dimensions (or size) of the vectors and
therefore I think operands first need to be properly dimensioned before
using them in an operation. Thus adding 2 vectors requires that both
vectors have the same dimension and thus respecting the laws of vector
fiels.