Boost logo

Boost :

From: Janek Kozicki (janek_listy_at_[hidden])
Date: 2006-06-09 20:17:33


Hello,

I'm posting this review a bit after the deadline, and I'm very sorry about
that. This is caused by the fact that I didn't know that all mailing list
members can write a review. When I learned about that, I had only about 2 days
left, but still I wanted to perform a deep analysis of what is in pqs.

---------------------------------------------------
    * Do you think the library should be accepted as a Boost library?

    Yes, I think it should. The scientific community, and industry are in desperate
need of such library, rejecting it just because it's not perfect would be a
very bad decision. It's not uncommon that boost libraries evolve once included,
I believe that this will be the case here. Moreover including it will greatly
increase the amount of feedback received and in result the library will reach
the "perfect" state much faster. After performing checks (described below) I'm
sure that the library is ready for inclusion.

    Note: below I often refer to yade ( http://yade.berlios.de/ ) which is a
framework for performing scientific simulations. Yade is my current project,
and a part of my PhD. I have done tests of using pqs inside yade, performed
speed benchmarks and unit tests.

---------------------------------------------------
    * What is your evaluation of the documentation?

    Before trying pqs I've read almost whole documentation (skipped few pages).
So those comments and suggestions below were written before I actually tried using pqs:

    As stated by other reviewers the introduction is a bit offtopic. Summary given
by review manager fits here best. In "Definition of terms" an UML diagram would
be nice, or at least some of those diagrams that are automatically generated by
doxygen. Later I discovered that in html version there is a diagram, but not in
pdf version. I wonder if any other reviewer would complain less if he saw this
diagram in pdf. Downloading pdf was much easier than unpacking the library and
opening index.html from it ;)
    
    I'd prefer a different name for coherent_unit and incoherent_unit, for example
si_unit and non_si_unit.
    
    Several terms ( 'dimensional analysis', 'dimensionally_equivalent', 'dimensional_math',
'dimensionless', 'dimensionful', 'physical_quantity', 'physical_quantity_system',
'rational_number', 'unit_output_symbol', 'unit_symbol' )
do not appear on the Entity Relationship Diagram. Therefore their
meaning is for me unclear: classes, methods or concepts? If they are concepts
then underscore ("_") shouldn't be used in their name. Method names can be
written in the ERD diagram inside rectangle of a given class, separated by
horizontal line (UML notation), so that it would be possible to look up methods
in the diagram.
    
    Also a reference-style listing of all classes and methods in the library would
be nice (this kind of documentation is usually auto generated by doxygen). This
listing would have references to correct places in synopsis. It would be also
*very* useful if at the end of the doc, there was a referenced list of all
examples in the manual, and also a list of all pictures.

    In Rationale:Clarity section the example with PQ_Box should be written full
'ready to compile' (because it is the first longer example that appears in the
manual).
    
    Also IMHO Rationale section should be put before Definition of terms. Usually
definition of terms is put at the beginning, but usually it is much shorter. A
causal reader who doesn't know what he should read, will read from top to
bottom - and in this respect it's more important that he reads Rationale first.
Because if he reads Definitions first - he may lose his enthusiasm due to fact
that definitions are _always_ boring (no matter how hard you try to make them
interesting).

    I prefer "per": m_per_s, etc...

    In pdf page 3, missing space: "The keyword typeofis used".
    
    In pdf page 5, missing space: "distinguish between different but dimensionally_equivalentabstract_quantities."

    In pdf page 18 there is a reference to "Example 3a", but all examples are
unlabelled, so I can only detect from the context of the sentence which example
is actually referred. It would be useful to label all examples. (and make a full
list of all examples at the end).

    In pdf page 23 comment is too long to fit on the page.
   
    In pdf page 28 - mistaken underline

    In pdf page 32 - I prefer synopsis written in valid C++, for example simple
copy'n'paste from file t1_quantity.hpp would work. In fact I had difficulty in
understanding line "if IsDimensionless<abstract_quantity>", and lines following
it, until I guessed that it's some kind of Cpp pseudocode.

    In pdf page 51 - mistaken underline.
    
    Word "useage" hurts my eyes. Heh, and vim's spellchecker underlines it as a bad word ;)

---------------------------------------------------
    * What is your evaluation of the implementation?

    More constants, including mathematical ones, would be nice.

    I don't like the separation of "three_d" and "two_d" - where quaternions should
be put then - "four_d" ? I've gone through similar problems in my own design
(just browse older revisions of yade in subversion repository
http://tinyurl.com/lgk8t ), and finally I've found it most convenient to put
vector (both 2d and 3d), quaternion and matrix inside "math". It was simply too
cumbersome to type everywhere all those lenghty namespace names (and I used
various variants, including 2d / 3d distinction). Therefore I strongly suggest
to use simply namespace "math" and put inside vector2d, vector3d, quaternion
and matrix. Extra namespaces increase fragmentation.
    Also please kindly consider following exampls, what name is more descriptive and shorter?

    three_d::vect
    math::vector3d

    OK, my favourite name is one letter longer, but still - "math" says a lot
more that "three_d", so this name is more descriptive, IMHO.
    Please be verbose, please rename vect to vector3d and vector2d, when putting them into
single namespace. Also I prefer name quaternion, over quat.
    I'm sorry about elaborating so much about this topic - that's because it's
close to my heart. In fact I can adapt to any library layout which Andy choses,
but what others think?

    I believe that in matrices there is no need to support both row-column and
column-row notation explicitly, but rather to keep internal compatibility
thorough whole library. Based on that I'd strongly suggest to rename rc_matrix
into matrix, and never implement cr_matrix. There is an interesting reasoning about
this problem inside panda3d software manual (given time I'll find it later).

    It's nice to see timer included. (but missing from docs?)

    It would be nice if t1_quantity was renamed to some descriptive word.
Perhaps a discussion on the mailing list would help to discover a better word.

    I did not delve as much as I would want into the implementation of t1_quantity,
due to lack of time. I noticed that there are two similar files, length.hpp and
length_.hpp, but there is no force_.hpp, or any other file with _ at the end.
This puzzles me a bit.
    

---------------------------------------------------
    * What is your evaluation of the design?
    
    The design as shown on the Entity Relationship Diagram ( /libs/pqs/doc/html/pqs/pqs_erm.html )
is so clear, that I can hardly imagine making a better design. It must be noted
however that I've never written my own units library.

    I'm not sure if fixing number of dimensions to exactly 7 is good decision,
perhaps it could be changed at later date, after the library is included to boost.
Those extra dimensions would be non-si of course, but could be useful for some
people (storage capacity of a harddrive comes to mind, or unit of information
measured with libraries of congress ;).

    It's good that Andy plans to implement vector, matrix and quaternions
operations, that will perform necessary unit checking during computation (all
at compile time!). My concern however is that Finite Element Method uses large
matrices (for example with 1000 columns), and I'm sure that users will prefer
to stick with ublas and all functionality provided by libraries they are
already using. Instead of more general (and perhaps impossible) solution that
would allow use of pqs with any other math library, I recommend to investigate
tight integration with boost::numeric::ublas. I'm sure that people responsible
for that library will gladly accept the idea, and accept any patches related to
this. Basically this will make boost::ublas a ,,sort of'' part of pqs.
    Other solution would be to reimplement such huge matrices in pqs, but how far
Andy could go?

---------------------------------------------------
    * With what compiler did you try the library?

    gcc version 3.3.5 (Debian 1:3.3.5-13)

    I have compared compilation times with and without pqs. Whole yade consists
of over 20000 lines of C++ and it compiles in 10minutes on my AMD 4400+ (when
I'm too lazy to use distcc which can reduce this time to 3 minutes - ideal when
I'm in hurry). Because pqs is just in one small module (about 100 lines of
code) I measured the time for compiling only this module. I took 5 measures,
and computed average:

    without pqs: 3.8 seconds
    with pqs : 4.8 seconds

    That gives about 21% increase. It would mean that if I used pqs in whole
yade 10 minutes could possibly become 12 (or 4 minutes with distcc). not much
difference I think. Full recompiles do not happen often, anyways.

---------------------------------------------------
    * Did you try to use the library? Did you have any problems?

    First I've compiled some of the examples without any problems at all.
When working with the first example I discovered that it would be nice if
include path for headers was shorter, like following:

#include <boost/units/length.hpp> // to get length
#include <boost/units/io/length.hpp> // or sth similar to get IO for length. (input.hpp is OK too)

    When working with the examples I decided to add quaternions to pqs, because
in yade quaternion calculations play a crucial role. I used three_d::vect as a
base source file. I was really pleasently surprised how easy adding them turned
out to be! I was afraid that some unexpected obstacles will stop me on my way
(which will cause this review to be send after the deadline of 9.june (oops
it's after the deadline now!)). I've sent this quaternion implementation to
Andy privately, I hope that it will be included. It lacks many commonly used
methods, but the base foundation for it is prepared, which I consider a success
of pqs design. I should stress again, that I've virtually done nothing in
effort that those quaternions will support units - those units were supported
right away. Only thing I didn't know (but didn't consider important - as I
spent less than 2h to complete the task) - was how to define a return type
of squared unit (used by method squared_magnitude() ).

    Then I started to adapt yade to pqs. I decided to use units only in one
module - that one where the actual calculation loop for selected simulation
resides (the most crucial part of whole simulation). Yade is capable of
modelling various models, so I picked the most causal one - simple DEM
simulation: spheres falling onto flat plane below: http://tinyurl.com/hjy9g .
Pqs was used only in simulation loop, so I expected extra slowdown due to
converting data to/from physical quantities at the start/end of each loop.
However it was simpler than converting all the data structures in yade. During
the process I needed a unit to express a spring stiffness (hooke's law F=kx,
where k is [N/m] and x is [m]). But because Fred Bertsch has just now posted an
email that "review ends" I decided not to add a new unit N_div_m, but rather I
used Pa that was multiplied during calculations by 1 meter constant (so maybe
adding new units should be made simpler). During the process of adapting yade
to use pqs I was impressed in overall how easy it did come by. As an indicator:
on the first time when my modified code compiled without errors it was running
correctly. I was really surprised, and this shows the power of units.
Especially when I recall how much time I've spent more than year ago to debug
this code! This gives me a clear picture of how much time can be saved with pqs
library. How often your code, dear reader, ran correctly after the first compile? ;)

    First thing I did after sucessfully compiling was to perform a unit test
simulation and compare the simulation state. I did this by running the
simulation for 1000 iterations (all the spheres dropped down to the flat plane
below and formed a pile), and saving spheres coordinates and their orientation
every 50th iteration. I used type double for whole calculation, and
boost::lexical_cast<std::string>(double) to save coordinates with 16 decimal places.

    So I had 20 files from each run, which I could compare between pqs and non-pqs
version. Comparing them showed that in 100th iteration about 20 spheres (out of
1000) had coordinates differing on the last decimal place. In 150th iteration
all spheres' coordinates differed on average on last 2 decimal places. In 200th
it was 3 last decimal places. The difference was growing during time. In 500th
it was last 8 decimal places. Due to the chaotic nature of this simulation,
every error will accumulate and grow. So I can conclude that in the simulation
loop with pqs there appeared a small chance of having different last digit than
expected. And this error accumulated over time. I'm not sure if this error is
caused by pqs, or maybe by extra assignments and conversions. Personally I
think that it's because the order in which components of the equation were
added and multiplied was changed.

    After performing above checks (aka. unit test) I did speed test, to benchmark
pqs. Results are not in favor of pqs, but then I should remind that they are
skewed due to extra constructor calls in pqs version. Those constructor calls
would not be here if I converted whole yade's data structure to pqs. So perhaps
I should not make those results public? Since I have no time now to consider whether
to make them public or not, here they are:

    without pqs (three runs):
Computation time: 370.939 seconds
Computation time: 382.615 seconds
Computation time: 360.418 seconds

    with pqs - and extra constructor calls, which would be avoided if I converted whole data structure! (three runs):
Computation time: 415.724 seconds
Computation time: 423.181 seconds
Computation time: 421.799 seconds

    that makes a difference of roughly 11% - 13%

    Previously when working with this code I've seen a speed difference of ~2% just
because of one extra constructor call in this loop. While I performed this
benchmark with good intentions I still cannot say whether pqs is slower or not.
This question will be answered when I fully convert yade to pqs. It's possible
that I will discover something new with kcachegrind profiler - but due to the
lack of time (review deadline) I cannot post profiler results now. If anybody
wants to see the code I used it's here: http://tinyurl.com/r83tr , while the
original code is here: http://tinyurl.com/ouhbc . Quick glance at the code
shows 21 extra assignments of double, and 9 extra construtcor calls.

    Also I should note here, that during my work with yade, every time I made a
design decision which favored clean design over calculation speed I was later
rewarded. The reward was, that clean design allowed unexpected optimizations in
different pieces of code, which at the end resulted in faster calculation
speed. I have reasons to believe that this will be also the case with pqs.

    During this adaptation of yade to pqs I encountered only one small obstacle:
operator *= for vect was missing. I had to write { vect1 = vect1 * some_scalar;};

    Also vect has a function magnitude(), please add squared_magnitude(). I tried to
add it myself, but got lost in defining a squared return type.

---------------------------------------------------
    * How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

    As much as a human can, given just 48 hours. Sounds a bit funny - that's
because earlier I was not aware that any boost mailing list member can perform
a review. So maybe it qualifies even as an in-depth study, but I'm not sure.
    
---------------------------------------------------
    * Are you knowledgeable about the problem domain?
    
    Instead of claiming good (or bad knowledge) I should rather describe my
background, which should let you decide whether I'm knowledgeable:
    
    Physics : I'm working at a technical university, doing a PhD which focuses
on modelling behavior of concrete. I'm familiar with Finite Element Method,
with Discrete approach, with various material models (viscoelastic,
hypoplastic, cohesive, etc). I've had experience with modelling granular
materials and cloth. Currently I'm developing (mentioned earlier) a framework
for performing physical simulatios - http://yade.berlios.de/ . I have 16
english-language publications (9 of them are just conference materials, 6 of
them are in scientific journals, and 1 is in renowed scientific journal
"Granular Matter"). No more details, as I feel now like boasting ;)

    C++ : I have over 10 years experience with programming in C++ (mostly
guided by my brother Bronek who is more active on this mailing list ;). As my
achievements in this field I'd like to list serialization library in yade.
After thorough testing we decided that boost::serialization does not qualify
for our needs - we needed not-on-the-fly serialization (important design
decision), and better support for various formats - xml easier to read by
human, binary, text, etc. We even developed a backend which allows to
serialize classes (their registered components) into QT dialog window (well,
currently unfinished, as containers are left out, but soon this will be fixed).
Another would be a multimethods library in yade, I made it as a "dreaded
excercise left for the reader" - those words Andrei Alexandrescu used at the
end of his chapter about multimethods in "Modern C++ Design". In this
implementation recursive templates allow theoretically unlimited number of
dimensions (at this moment only 3 are used), and up to 15 arguments in
multimethod call. This library focuses on speed, so a callback matrix is used.
By looking at this source code (in subversion repository of yade
http://tinyurl.com/hxbr7 ) you can judge my familiarity with C++. (I'm more
proud of multimethods in terms of template programming, so look there first,
serialization is still waiting for some cleaning ;)

---------------------------------------------------
    * What is your evaluation of the potential usefulness of the library?

    For people working in science and engineering this library is extremely
useful. There is a bad need for such library in the scientific community.
I had a chance to envision myself how pqs reduces the chance of making stupid
mistake in calculations. Once my code compiled without errors it was correct! If
I recall now how much time I've spent debugging this code more than year ago, I
see that pqs allows to save huge amounts of time.

---------------------------------------------------
    * Other random comment:

    I feel there is a need for another discussions how to call this library in
boost. Most people do not like "pqs", some people (including me) prefer name
"units", but others have strong and reasonable arguments against name "units"
(but I'm not sure if anything else was proposed).

-- 
Janek Kozicki                                                         |

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk