Boost logo

Boost :

From: John Maddock (John_Maddock_at_[hidden])
Date: 2002-01-17 07:46:32


Multi-array review result.
~~~~~~~~~~~~~~~~~~~

Multi-arra is accepted into boost, with the condition that the main issues
listed below are fixed prior to release.

Multi-array generated a lot of varied comment, with not everyone unanimous
that it should be accepted in its current form. For this reason I have
summarised all the main comments below, and divided them into three
sections - defects (that should be fixed), requests (that would ideally
should be honoured, but which may not be reasonable or possible to do so),
and non-defects (with rationale why they're not considered a defect). This
is not necessarily the final word; anyone with a contrary view should speak
up now!

Multi-array defects:
~~~~~~~~~~~~~~~

From Douglas Gregor:
  - I dislike the "almost" conformance to RandomAccessIterator; not because

there is an inherent problem with not having a true reference, but because
"almost" modeling to a concept doesn't really give us any information about

the concepts that are modeled. I'd much prefer to see a separate concept or

set of concepts enumerated that it does fit, and within those concepts the
differences from RandomAccessIterator can be explained. Jeremy's new
iterator
categories would be a reasonable set of concepts to use: then the
multi_array::iterator types would model RandomAccessTraversalIterator,
ReadableIterator, and (for non-const) WritableIterator.

  - At several points, const_iterator is said to model OutputIterator. This

should be InputIterator.

Main documentation (index.html):
  - I dislike the phrase "or fatal as under MSVC." Compiler bugs may be
obnoxious, but we shouldn't be taking shots at vendors within
documentation.

MutableMultiArray Concept documentation:
  - index_range and extent_range have the exact same documentation string,
which can lead to confusion (it did for me).
  
  - The "Post-condition" column need not exist unless it is going to be
used.

  - The origin() function does return the address of "a[0][0]...[0]",
unless
all of the indices are zero. It returns the address of "a(a.index_bases())"

(?).

multi_array class documentation:
- What happens when the dimensions don't agree when the assignment operator

is called? Undefined behavior? Exception thrown?
  - The assign(values) form (that assigns to the multi-array from elements
in
the sequence [values, values + num_elements())) strikes me as being _very_
dangerous, because it would be very easy to overrun your input buffer. I
would (strongly) suggest removing this form.
  - Does assign(first, last) require that std::distance(first, last) equal
num_elements()? I believe that it should, because a partial assignment is
not
an assignment, and generally it signifies a user error if one tries to
assign
to a multiarray a different number of elements than the multiarray can
contain.
  - The operator()(indices) and data() member function descriptions make
that
same mistake as in the origin() function, mentioned above.

       *********

From Toon
documentation:
in the index.html, you have specified anchors that contain a ':'. please
remove these because e.g. konquerer can not interpret them correctly. So
  I suggest <a name="sec:dimensions"></a> would become <a
name="dimensions"></a>

        *********

From Jens Maurer:
index.html:
 - introductory paragraph: It sounds confusing to me. First, you say
it's "equivalent" to a "vector of vectors", then you say you can't
resize a multi_array. I believe resizing is a major feature of
std::vector. A more careful wording might be in order.

 - The links to array_traits and boost::array are empty (and thus broken).
More links are broken elsewhere.

 - Move the concepts section way up to just after the rationale. It's
more standard-like and (IMO) easier to grasp a library top-down.

 - I support merging the two concepts documents.

 - I would appreciate if the example code (particular in the
docs) would refrain from using the token
"array", since it
might be confused with Boost.Array's template class.
"marray" seems ok.

 - In the example code, the loop exit conditions should read
   "n < 3" instead of "n != 3". The former is, IMO, more
customary and doesn't break if "n" happens to be 10 at the
beginning of the loop.

 - The longish "associated types" listing is redundant
with the concepts documentation and should be removed from
index.html.

 - "Each array can be assigned to from any of the others," is
certainly incorrect for const_multi_array_ref.

 - "Array View and Subarray Type Generators": The first two
sentences are misleading. I know of no situation in ISO C++ where
member templates cannot be used. If you want to say that
some broken compilers have a problem, then say so straightforwardly,
e.g. "The additional template keyword ... might be considered
obfuscating. Also, some compilers cannot handle member templates."

 - "Specifying Array Dimensions": It appears to be dangerous
to blindly retrieve N elements from the collection. I'd rather
have it retrieve from begin() to end() with the requirement
that size() == N.

MutableMultiArray.html:
 - "It attempts to refine RandomAccessContainer" is much too vague
to be useful. Just start out saying what it *does* refine
(you don't say that at all), and have a list of differences
(the "not quite") with RandomAccessContainer at the end.
Also, the "motivation" is mostly redundant with index.html and
could be either moved there or removed altogether.

 - Please reorder the various tables ("associated types" etc.) so that
the number of forward references is minimized (".. the same as element").

 - NumDims is never introduced, just used. Please introduce it before
the tables start, e.g. "In the following tables, X is a type that
models MutableMultiArray; NumDims is an integral constant expression
with value > 0, ...". Similar for the "expression" table: Lots of
variables used without defining them.

 - size_type and difference_type need more explanation
(or refer to the std lib concept that you refine).

 - As said elsewhere, the iterators that you provide need
better definition. "almost" is a no-no in reference documention.

 - I don't like the index and index_gen names. They don't convey
the information that they are types, not values. Usually, "index"
would be a variable name in my mindset. index_gen could have
"range" in it (index_range_type?)

- Shapes, Strides, Index Bases returns const something*, which
appears fishy to me. Wouldn't an iterator-based iterface more
in the spirit of C++? begin_shapes(), end_shapes() etc.

 - "Collection" is not a concept defined in the C++ standard.
So you should fix the link, and make sure it points to somewhere
within boost.

 - What's "indices"? It appears to be a library-provided
variable. So you should say "boost::indices" so that people
know where to find it.

 - rend(): No iterator may point to before the first element.
You may want to replace the verbiage with the C++ expression
whose value is returned. Similar for rbegin().

 - origin(): Why does it return a pointer, and not a reference?

 - The complexity guarantees should be integrated into the
tables. size() being at most linear with MultiArray's size
sounds frightening, and vague (number of elements?).

 - The invariants could say more about the relationship
of size(), strides(), shape(), index_bases() etc. Also,
how the a[idx] indexing relates to the a(index_list) indexing
method.

Code

----
 - Needs attention to the fact that classes having dependent base
classes don't have the members of the base class visible for
unqualified lookup.
 - The "global" status/regression.cfg will likely only contain one
test for this lib.  So you may want to define a all_test.cpp (or
somesuch) that #include's all supposedly working test source files.
(The separate regression.cfg with the detailed tests is nice and
should be kept as well.)
 - Please augment the fail_xxx.cpp tests with a comment that
says where and why it should fail.  Oh, and if it fails
early because assignment is not allowed for a const object,
there's no need to have all the "verify" stuff in there.
- It silently adds a boost/algorithm.hpp with some SGI STL code.
I'd say that an improved algorithm library needs a separate
review.  Having a boost/multi_array/detail/algorithm.hpp is fine,
though.
 - libs/multi_array/example contains only one directory,
multi_array.  This seems unnecessary and redundant.
*********
From Ralf:
At the top of multi_array_ref.html it says:
    multi_array_ref does not own the data passed to it.
Then:
multi_array_ref& operator=(const multi_array_ref& x)
        Assignment operator. This performs a deep copy of the other
        array. Note that array dimensions must agree.
Who owns these data? The returned multi_array_ref?
Do some multi_array_ref own data?
Should the sentence at the top be qualified? E.g.
   multi_array_ref does not own the data passed to it
   in the constructor.
*********
From Toon:
Performance :
Along the same line ; The last few weeks there was a discussion about 
the performance and e.g. Ralf W. Grosse-Kunstleven suggested an indexing 
operation that was much more performant compared to the usual list of 
square bracket indices.
As I assume multi-array will be used in numerics, it would also be nice 
to have performance comparisons between multi-array and e.g. blitz++, 
MTL, plain C arrays and fortran.
*********
From John Maddock:
In the introduction you say:
"iterator 
A type that dereferences to return objects of type reference. If NumDims ==
1, then this is a type that models Random Access Iterator. Otherwise it
models both Input Iterator and Output Iterator (it almost models Random
Access Iterator, but reference is not a true reference type). "
What are the differences between the iterator type and a random access
iterator? Why can't it be a true random access iterator? I assume that it's
because the iterator returns a temporary proxy object, rather than a
reference, if so say so.
The following doesn't immediately make sense:
"The second method involves passing the constructor an extent_gen object,
specifying the matrix dimensions. By default, the library constructs a
global extent_gen object boost::extents. In case of concern about memory
used by these objects, defining BOOST_MULTI_ARRAY_NO_GENERATORS before
including the library header inhibits its construction. "
I assume that boost::extents is a global that can be used to avoid the
construction of a temporary extent_gen object? If so say so.  This looks to
me to be non-thread safe as well (so say so), maybe this isn't such a good
idea anyway?  Does constructing a temporary extent_gen really add so much
overhead, or is it just syntactically poor?  If the latter then maybe there
is a better way somehow?
Under element access you have two methods:
"The second method involves passing a Collection of indices to operator().
N indices will be retrieved from the Collection for the N dimensions of the
container. "
Is this more efficient that the first, or just a syntactic alternative?
In the intro the link to array traits in the sentence "With the help of
array_traits, you can even extract " is broken, in fact there appears to be
no docs for this class? If so that really needs fixing.
In multi_array.html the heading:
"template <
   typename Element,
   std::size_t NumDims,,
   typename Allocator = std::allocator<Element>
  >
multi_array"
Should be a valid forward declaration, it looks odd as is.
The same in multi_array_ref.html and const_multi_array_ref.html
I'd also like to be able to get quickly to the reference pages (maybe from
the table of contents), as it is you have to read down and then follow a
link in the text.
I like test_cases.html, maybe a similar page for the examples would be
useful.
Performance
~~~~~~~~~
A lot has been said about this elsewhere, I would only say that I would
like to see some performance comparisons in the test programs, and some
indication in the docs on how to get the most out of the library - how do
the different access methods compare - using operator[], compared to an
index collection, compared to enumerating iterators, compared to C-arrary
access etc.  Does access time increase with the number of dimensions etc? 
For the future, I would like to see some work on improving the performance
of operator[], but that's an extra feature request not a review requirement
- we don't even know if it's possible at present.
***********************************************************
Desirable changes:
~~~~~~~~~~~~~~
From Douglas Gregor:
- I think that the MutableMultiArray and ConstMultiArray concept documents 
should be merged into a single MultiArray concept document, with comments 
describing the (relatively few) differences. This is just a matter of 
preference, of course.
multi_array class documentation:
  - Overall: I would like to see Standardese documentation (i.e., 
Requires/Effects/Postconditions/Returns/Throws/Complexity/Rationale)
Code comments: 
  - Why is it that the concept checks are commented out of the source
(i.e., 
in multi_array.hpp)?
  - There were some problems compiling with Comeau's strict mode, but I
will 
contain the author directly with those fixes. They aren't important enough
to 
warrant mailing list traffic.
*********
From Toon
Complexity guarantees:
I would like to see more elaborate complexity guarantees : what is the 
complexity of indixing in x dimensions using a list of square 
bracket-indices, what about the complexity guarantees of a matrix-view.
Relation with MTL and ublas :
Two other matrix-related projects are going on and I think it would be 
good too check what the relation is of multi-array to these before it is 
accepted.
I think it is good that there are sometimes overlapping libraries in 
development as this provides you with invaluable feedback on 
best-practices too achieve a specific goal. But the whole of libraries 
that are accepted inside boost after a review should be consistent and 
should have minimal overlapping while having maximal reused inbetween 
libraries.
*********
From Jens Maurer
- I'm in favor of chaning the class documentation to the
usual layout of the C++ standard.  It's more concise, more
precise, simply better.
MutableMultiArray.html:
- I'd merge the "expressions valid" and "expression semantics"
tables and omit the post-condition column if unused.
The "Name" column is largely redundant.
 - "if a is mutable": Is this standard-speak?  Shouldn't it be
"const" and "non-const"?
*********
From Petr Ovchenkov:
For number of dimentions there are some very important cases,
when performance is a key issue:
   N = 2
   N = 3
   N = 4 (may be)
The performance for this dimentions can be near (or better) then
FORTRAN code^# if we roll out many internal loops with recursive
templates, and, may be, with specialized algorithms.
---------------------------------------------
  ^# [as shown by Todd Veldhuizen in Blitz++]
The absent of specialization for this two-three dimentions in
multiarray librarias (as mtl, as ublas) is a significant drawback,
IMO.
I think, that interfaces of specialization for N=2,..,4 should be
the same (or very similar) as for common case.
From Joerg Walter:
OTOH, would it technically be possible at all to provide a proxy free 
implementation of an interface like
 
operator () (int i)
operator () (int i, int j)
operator () (int i, int j, int k)
 
for low rank multi_arrays?
****
From Larry Evans (on axis reversing):
I'd  recommend providing a reverse(Axis i) for reversing each axis
individually instead of using  only the general_storage_order constructor.
Also, this should be easy to do.  For example, section 6.1.4 of the book
says
that reverse(Axis i) results in a new "stepper" (indicated below with '
marks)
defined in terms of the old stepper as:
             t'[i] = s[i] - t[i] - 1
             d'[i] = - d[i]
Then, on p. 86, the offset of the initial value of the array is given by:
             alpha = t[0]*e[0] + t[1]*e[1] + ... + t[nrank-1]*e[nrank-1]
and the offset at arbitrary index is given with the help of gamma,
which is defined on p. 86 as:
              gamma[i]  =  sum_{q[j]=i}(d[j]*e[j])
(The actual nation used was the greek sigma indicating summation).
I think what Budd meant was:
               gamma[i] = d[q[i]]*e[q[i]]
Then the offset at arbitrary index, a[0]...a[nrank-1] is:
                offset = alpha + sum_{i=0}^{nrank-1}(a[i]*gamma[i])
I think the t's can be eliminated by storing the alpha with the
stepper.  Anyway,  the following is what I coded as Stepper::reverse(int
aAxis):
            { startVar()+=(shape(aAxis)-1)*stride(aAxis)
             ; strideVar(aAxis)=-stride(aAxis)
             ;}
startVar() == alpha.  The others should be self-explanatory.
Stepper::transpose(void) was coded as:
      {
      ; int j
      ; int m=rank()
      ; int* tconfig= new int[m]
     ; for(j=0; j<m; j++)
       {
       ; tconfig[m-(j+1)]=config()[j][Axis]
       ;}
     ; for(j=0; j<m; j++)
       {
       ; configVar()[j][Axis]=tconfig[j]
       ;}
      ; delete[] tconfig
      ;}
where config()[j][Axis] == q[j].
*********
From John Maddock:
You say that: "Each array class provides constructors that accept a storage
ordering parameter. "
Is this the best design choice - could element access be made more
efficient if the storage order were a template parameter?  Are there any
situations where you would need to access the same array but with different
storage orders, I can't think of any, but maybe you can?  This is a
question not a criticism BTW :-)
The following is a minor point, but I would have preferred the array
concepts to use the same table format as the standard, using three
sections: expression, return type, and description.
*********
From Daryle Walker:
The author may want to #include Boost files as <boost/XXX.hpp> instead
"boost/XXX.hpp".  You may not agree with the use of the former form, but we
should be consistent.
You may want to add routines to let a function (object) be applied to each
element.  But maybe that can be simulated with std::transform and the array
iterators.  I agree with others that full arithmetic functions should be
delayed to other types that use these types as implementation, since there
are several directions to proceed.  I also agree that a public algorithm
library is too orthogonal (and potentially large) to be sneaked in through
this library, especially if it carries STL-copyrighted stuff.
**********************************************************
Non-defects:
~~~~~~~~~
From Gary Powell:
After a quick look at the code, I'm not seeing the full suite of
operators, that libraries like MTL and others have. (Expression template
stuff.) Is this because multi-array is designed for a different set of
problems?
Rationale: Jeremy Siek replies:
"The multi_array fits in very nicely with MTL 3. The container of
containers flavor of interface is the same for MTL 3 matrices, as is the
way array views work.
There is no overlap between multi_array and MTL 3. multi_array is a multi
dimensional container, not a numeric (linear algebra) entity. However,
multi_array will be an important building block in creating various
numeric entities. These numeric entities (including the element-wise
operators that everyone is interested in) will come along as separate
libraries as people implement them. There may need to be some small
changes to multi_array to accomodate them, but the changes should only
need to be additions.
Ron's approach of first submitting a multi_array without the numeric
operators is the right approach because:
1) it is easier for the Boost membership to review one small component
  at a time. ("small" here is a relative term)
2) There are many non-numeric uses for multi-dimensional containers."
*********
From Gary Powell:
  While the runtime speed may be a problem (C array vs multi-array) what
are
the chances of future compilers being able to generate more optimal code?
My past experience has been that speed is everything but of course that
came
with lots of implementation bugs that could have been avoided if we had a
strongly typed multi array to work with. Hence my dilemma.
Rationale: at least one compiler (KCC) produces optimal code with
multi-array, however this probably does need more research.
*********
From Ralf:
At the moment I have serious doubts about the usefulness
of a multi-dimensional array package that does not include
memory management with shared pointers of some kind (as
e.g. Blitz++ does). Am I missing something?
Rationale: This is a valid concern, but I don't think we can hold Ron
hostage to it.  We need to do more work on policy classes and memory
management in general (cf smart pointers).
- John Maddock
http://ourworld.compuserve.com/homepages/john_maddock/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk