Boost logo

Boost :

From: ïÌÅÇ áÂÒÏÓÉÍÏ× (olegabr_at_[hidden])
Date: 2006-10-09 06:38:28


Tom Brinkman writes:
> The review of Generic Image Library (GIL) begins today,
> October 5, 2006, and continues through October 15, 2006.
> [...]
> Review questions:
>
> Please always explicitly state in your review, whether you think the
> library should be accepted into Boost.

In current state - no. But I hope that it'll be accepted the next time.

>
> You might want to comment on the following questions:
>
> - What is your evaluation of the design?

Very promising, but I have some dissatisfactions about it that I'd like to be addressed. Or, may be just resolved if they are the result of my misunderstanding.

> - What is your evaluation of the implementation?

Have not seen the details.

> - What is your evaluation of the documentation?

Not good, really.
It consists of two parts: the tutorial and the design guide.
the first is too superficial and the second is both superficial and too detailed in some places. It is easier to say what the documentation of this and any other generic library should be:
1) The description of the domain covered (it is equal to abstract mathematical model)
2) introductory examples of simple library usages
3) definitions of concepts _with_ examples of it's implementations and adoptation of existing types to model the concept
This section covers the important question of how one can extend the library. It is too hard to harvest this information from current concepts definitions in GIL docs.

> - What is your evaluation of the potential usefulness of the library?

This library is very useful for me in particular and it can be made much more useful (see below)

> - Did you try to use the library? With what compiler? Did you have
> any problems?

No, I've not used it yet.

> - How much effort did you put into your evaluation? A glance? A quick
> reading? In-depth study?

In-depth reading the tutorial, quick reading of the design guide and a lot of thinking.

> - Are you knowledgeable about the problem domain?

I think yes. My PhD was about electron tomography where I've implemented 3D reconstruction algorithms on a series of 2D images (the result is in 3D). Currently, my job is related to OCR and other pattern recognition software development that is tightly coupled with imaging.

Below is a constructive (I hope) critics and suggestions for GIL.

First of all, GIL claims to apply generic programming techniques to achieve its goals. The generic part is understood well now:
1) define concepts
2) provide implementations that models concepts defined

But what about the programming part in "generic programming" term? What the programming is all about? This question has many answers, I have one that is close to the topic: "Programming is an implementation of abstract models in a form that computer can understand"

"Abstract model" - is the most abstract one, like in mathematics we define abstract objects and operations on them. then we investigate the "world" created by such a choice of objects and operations.

It would be a true to say that every good program or library has a very well defined "Abstract model" in it's grounds.

Now I can formulate my main dissatisfaction with GIL:
It seems that it has no well-defined abstract model behind.
It leads to not so clear design IMO.

Now I'll try to sketch out the model that GIL can have behind it.
It is relatively easy. What is an Image at first place? (I mean abstract image here.)

Mathematically speaking, image is a N-dimensional vector function in M-dimensional space.
In other words, we have a set of points in one N-dimensional space (let call it S(N)) and a set of points in another M-dimensional space (let call it S'(M), where ' corresponds to the target space), that corresponds to the first set.
In short form the same can be formulated as:

Image := S(N) -> S'(M)

where ":=" means "by definition"

For example, for the 2D RGB image N == 2 and M == 3.

To complete this definition, we need to add that each dimension in both spaces is discretized and finite in some interval.
For example, it is [0,255] for one byte channel in GIL terms.

This definition can be used as is to define the Image and View concepts for GIL.

After defining abstract object we can play with - the Image, we can define Transformations on such objects. and different kinds of transformations.

The most general transformation can be expressed like this:

Transform := {Img1} -> {Img2}

where Img1 := S(N1) -> S'(M1) and
      Img2 := S(N2) -> S'(M2) and
      {} - defines a set in mathematical sense, so {Img1} is a set of images

in other words, all attributes can change:
dimensions of both, source and target spaces (S and S')
and a number of images (let call it |{Img}|) in a source and destination image sets:
|{Img1}| and |{Img2}| can be different

The simplest case is when
N1 == N2 and
M1 == M2 and
|{Img1}| == |{Img2}| == 1

The example is std::transform algorithm (N1 == N2 == 1)
It also has a two-sequences overload that can be used in case when |{Img1}| == 2

The reference to the STL is not by chance. The part of STL that defines and operates on a Random Access Sequences has the same Abstraction behind it, that is formulated here, but only for case when N is identical to 1.
Wait a minute! STL operates on objects of arbitrarily type T, not on a linear spaces of dimension M as in our formulation here. Yes, but there is a one to one relation between the two. (Each class object can be seen as a set of values that it holds and this set of values can be seen as a set of choords in our M-dimensional space)

Note, that even for the case when N == 1 STL doesn't cover all kinds of transformations possible, and it is really sad, GIL should be better in this aspect. Moreover, GIL can (and may be should) supersede STL in a Random Access Sequences Transformations business.
I want to claim that GIL should provide a primitives to express any kind of Transformation possible.

Ok, now we have the Abstraction, let see how it can be expressed in C++ in a generic fashion.

There is nothing to invent here - the STL and the GIL is already there.
In STL the Image object is represented by std::vector<T>, std::deque<T>, std::basic_string<T> and T* for any type T.

The transformation in STL is represented in a form of generic algorithms like std::transform, std::sort, std::copy, std::remove_if, etc.
These algorithms operate not on images directly, but the additional layer of iterators is added and algorithms operate on a pairs of iterators that represents an iterators ranges. The goal of iterator is to abstract out the type of the container object it originates from, that allows as to implement all algorithms once and for all existing and future container types.
The real reason to provide this additional "indirection" layer is to support T* - containers.

Nevertheless, it opens up the new way to express the Transformation - through a special iterator adapter like boost::transform_iterator. It can do exactly what std::transform can do (N == 1 version), but it can do it in a _lazy_ way, it doesn't compute it's results, but can be invoked later to do this job.

It means that in STL we have two ways to express the Transformation:
the _strict_ - by algorithms
and the _lazy_ - by special iterator adapters

Is it good or bad? I think that having two ways to do the same thing is generally bad.
The lazy way is superior to the strict one, because the last can be implemented by simply invoking the lazy one in a well defined manner
strict := apply(lazy);

The GIL has a View concept that is the same as an iterators range in STL is. It means that GIL inherits from STL the same duality problem in expressing the Transformation concept.

I claim that GIL has to remove algorithm concept completely, and leave only View with a new meaning - it is not only a lightweight Image representation, but it is a representation of the Transformation abstraction from now.
View would be invoked when, and only when it would be assigned to Image:
Image img = some_view; // all computation made here

Given such a TransformationView concept it is interesting to explore some new possibilities that it opens up - For example, one can define mpl::vector<view_type1, view_type2, view_type3, etc.> and apply some meta-programming to construct the final view type that would be invoked. And if View construction can be separated from it's lazy-application, then Views can be treated as a function objects that can be groped into a tuple and manipulated in some non-trivial way in response to run-time user input. I see, that there is a huge new field to explore...

Here you should say that View already is a Transformation in GIL. Yes, but from docs I've realized that that its potentioal is not clearly understud by GIL authors. I claim that TransformationView should become the central part in GIL, almost all GIL programming and using/extending should be centered around the View concept.

Next, and very important issue: laziness produces overhead without some kind of memoization technique. I claim that GIL should have one built-in.
Now GIL docs have examples of suboptimal algorithms, that are suboptimal because of this absence of memoization support.
I mean examples, where one needs to apply non-trivial views on each step in order to increase performance, but it is:
1) suboptimal
2) looks agly
3) wastes memory

The same examples could be rewritten by hand without GIL in a much more efficient way. It is not a good conclusion for library that claims that "Performance. Speed has been instrumental to the design of the library. The generic algorithms provided in the library are
comparable in speed to hand-coding the algorithm for a specific image type."

I understand, that the general support for memoization is a hard task, but I claim, that it can be done with a little help from transformation (or View in current GIL) writer.

My final claim is about representativity of implementations of concepts in GIL. I claim that they are insufficient. For example, sub-byte channels should be implemented to ensure that GIL's concepts are good enough to handle it. In particular, I'm very interested to see an implementation for 1bpp colorspace that is very common in OCR development.

I can provide an example usage of such a functionality, that can be used by authors of GIL to illustrate it's abilities:
Suppose we have a 1bpp image and want to find all horizontal lines that are black ( bit is set to 0 ). Line is defined by its first and last positions (x1, x2).
The example code should define a new image type that consists of a rows of lines (x, x2), like std::vector< std::vector<line> >, and a new View type that can produce such an image from normal 1bpp image.

One more idea of how to improve GIL:
It is typically enough to use N and/or N-1 dimensional Views to implement Transformation for N-dimensional View. Only in rare cases there is a need for N-2, N-3, ... 1, 0 - dimensional slices, like 0-dimensional pixels for 2D images.
For example, your x_gradient algorithm can be implemented as a Transformation View by using 1D SubView consisting of only two points.
1) define the 1D View like here
View1D v1d = {1, 0, -1}; // Note: pseudocode
2) multiply the current View defined by enclosing view and current location by the v1d
3) use accumulation_view on the result to calculate the final gradient

The vast magority of image processing algorithms do convolution of an image with some KxL kernel, the technique above can be used to express them in a universal way.

The idea here is that the programming model for GIL should be as follows:
1) if there are enough predefined views - express your program as a compositions of views.
2) if you are forced to wrote your own view - do it in terms of higher-dimensional (N and/or N-1) sub-views, use lower-dimensional only if absolutely nesessary.

The reason is efficiency at first place:
such highly abstract transformation implementation could be effectively optimized by the lib internally. For example, by parallelizing it, or in a case of convolution - library can provide a higher-order convolution_view, that would be implemented through FFT transforms internally.

I hope, that my comments would help authors to make the GIL that would be in a magnitude better than STL is now.

Best regards,
Oleg Abrosimov.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk