Boost logo

Boost :

Subject: Re: [boost] Gauging interest in new library re: multiscale/multigranular containers
From: nathaniel_at_[hidden]
Date: 2011-06-07 13:29:36


Thanks to the two comments (so far). I am happy to provide a more detailed
discussion but thought it appropriate to keep my original posting shortish
if possible.

For a little backstory, although I have been working on code related to my
original posting for a while, I decided to formally submit a "gauging
interest" after some emails to Barry Smith, the head of the National Center
for Ontological Research, who sent me a stack of research papers with new
data structures and algorithms relevant to "multiscale" analysis. This
gave me a range of functions or algorithms to implement on top of those I
was already working on. There's going to be a conference on biomedical
ontology in about six weeks' time, so I decided to assign myself the task
of completing at least part of a library in preparation for that. NCOR
seems to be working on a "partition theory" which takes a top-down approach
to scale. I'll base my comments on "A Theory of Granular Partitions":
http://ontology.buffalo.edu/smith/articles/partitions.pdf. Partitions are
defined as collections of "cells" which can be nested, including one
"maximal cell". For lack of a more elegant term, suppose a collection
class is called cell_set, as in

template<typename CELL_Type> class cell_set;

If this class is to model the theory then any instance will need a "maximal
cell" or "root" in which all others are contained, which could be
autogenerated with a member function. Moreover, it needs a "subcell"
relation and an "insert" function which probably would take an existing
cell as parent to any new cell and that would default to the root. I think
a variation on this class should allow multiply parents, but for now I'll
assume along with the above paper that parenthood is unique. The paper
also defines a "parthood" relation and "mereological product" such that c *
c' represents the set of subcells shared by two cells (which given unique
parenthood must be either c or c* or null). They define two notions of
summation, one of which simply groups all subcells of c + c' into a set,
and another which looks for the least cell actually defined in the
partition that contains all of those subcells.
I could mention further structures which they define (and which could be
member or namespace-scoped functions), but you get the idea. As a
practical tool, this construction seems to be used as a kind of sorting
machine. In the simplest case, there is one class of "target objects"
which are to be fit into the set of cells (they use the term "projection"
for a cell "projecting" onto one or more of those objects). So cell_set
then takes two type parameters as in

template<typename CELL_Type, typename TARGET_Type> class cell_set;

the idea being that any cell_set instance would have a "location" function
which takes a TARGET_Type value or collection and "locates" it within a
cell in cell_set (defaulting to root if nothing else works). Users would
have to provide a boolean "fit" function testing whether a TARGET_Type
object fits into a CELL_Type object.

Scale can be defined on cell_set instances by assigning the largest scale
to root and stipulating that subcells are smaller-scaled than parent cells.
 A default implementation could assign a scale 1 to root, define "depth" as
the number of parent cells between a given cell and root, assign scale 0 to
the "deepest" cell, and if D is the max depth assign scale to other cells c
using 1 - depth(c)/D. More nuanced implementations could consider the
number of different subcells which a cell contains, with the idea that
cells with more "inner structure" should be assigned larger scale. Scale
can also be used to filter cell_sets into more maximal or more minimal
calls, providing a kind of granular query machine. Imagine, for example,
that a store is building a database for inventory, which fits into a
collection of nestable categories. Scalar filtering might come into play
if the store wants to temporarily "reassign" products to new categories to
prevent categories being listed on their website with too few products, or
conversely if the store wants to spotlight categories for some sort of
promotion but only ones with no subcategories.

Now, personally, I find that parent uniqueness is too restrictive and does
not really fit real-world situations (for the same reasons that single
inheritance is problematic...). Easing this restriction makes cell_sets
more complex in a number of ways. They end up being graphs rather than
trees, with parenthood one edge-type among others. I'm not saying that
"partition theory" is the last word on multiscale modeling; I think another
useful class would be some sort of multi_graph<CELL_Type> which shares many
features with cell_set but allows multiple parents and perhaps allows the
maximal cell to be undefined. This latter class would be more "bottom up".
 Instead of introducing new subcells as partitions of existing cells, new
cells would be constructed as "mergers" of existing cells. In addition,
edge types could be classified based on whether c <> c' (for some link type
<>) implies that c is a subcell of c' or that c and c' are both potential
subcells of some parent. These differences give rise to different kind of
"walks" or constraints on walks, e.g., only follow edges to parents, or to
children, or to siblings. Of course, cell_set as I described could be
considered a special case of this more general multi_graph structure, but
if multi_graph is to inherit cell_set's features it should support the
construction of a "sort machine" for one or more target types; that is,
provided a user-defined function "fitting" objects "into" vertices in the
multi_graph, it should provide location functions for single or collections
of objects of the proper type(s). Scalarity and granularity functions
would work much as described for cell_sets, although graph-theoretic
measures such vertex degree can be used to refine scale quantification for
cells.

I am particularly interested in using multi_graphs so defined for modeling
GUIs. I've read some interesting work by RedWhale software, in Palo Alto,
concerning algorithms for matching data elements to UI controls, which they
call "interactors". For example, numeric data field would be mapped to
spinners, list boxes, edit fields, scrollers, radio buttons, etc., based on
their range of allowed values. Assume that the links between data
components and interactors are represented by a multigraphs, so that for
one edge type <v>, c <v> c' implies that c is a data field and c' is an
interactor type which will visually represent c in a GUI. Suppose moreover
that c <s> c' implies that c and c' are both interactors but c' is a
higher-scales control, such as a notebook or dialog box; and that c <c> c'
implies that c and c' are data components with c contained inside c'. The
visual clarity of a GUI can be analyzed by studying how these three edge
types interact in the graph, and any patterns which emerge could be used to
refine algorithms to automatically generate GUIs based on data models.
This could be extended by also introducing user-generated-events and
application tasks, such as saving updated data to disk, so that c <t> c'
could mean that c is a event meaning that the user wants to perform task c'
(e.g., save to disk).

Finally, to respond to a couple of points raised. For argument's sake,
suppose I develop a single cell_graph class which incorporates both the
partition and multigraph functionality I've described. (Actually I'd
probably want instead to do a small group of related classes rather than
one monolith, but, again, this is for argument's sake). Certainly this
could be implemented in terms of existing STL containers, but the choice of
which container to use is affected by how the cell_graph itself is used.
As far as I can tell, for optimal memory usage cell_graph should work on
values rather than pointers, but the values should never be copied, if
possible, so that pointers into the cell_graph remain valid indefinitely.
On the other hand, there are occcasions when it is useful to use cell_graph
more as a vector than as a list. I won't go into the precise memory layout
I've been using because I have not worked out all the details yet, but I
think the choice of something nonstandard is justified. Conversely, even
if (say) cell_graph<CELL_Type, PROJECTION_Type> is implemented in terms of
standard STL containers behind the scenes, there is enough extra
functionality to implement that it makes sense to develop it as a separate
library. There are actually several other type parameters which I have not
mentioned (although they take defaults), so using "bare" vector or set
would yield very unwieldy code; better to hide as much as possible behind
typedefs, partial specializations, or inhertiance:

template
<typename VERTEX_Type, typename CONNECTOR_Type, typename NODE_WRAP_Type,
typename NODE_SERVICER_Type>
void
TierBlend::Route_Matrix_Compiler
<VERTEX_Type, CONNECTOR_Type, NODE_WRAP_Type,
NODE_SERVICER_Type>::compile(NODE_SERVICER_Type* servicer) ...

Stuff like this should be library code, not user code.

Finally, there is no single meaning to "scale" or "granularity". A scale
metric can be induced on a partition or "partition-graph" based solely on
nesting (e.g. subcells inside parent cells) but in real world situations
there is no reason why "sibling" cells should have the same scale. The
point is merely that classes representing these structures should provide
for a scale measure associated with cells, graph vertices, and/or
subgraphs, and should be a possible constraint or characterization of edge
types as well (e.g. it could be stipulated that c <> c' implies c < c' for
edge type <> and parthood relation <). Similarly, "granularity" could be
associated with a given filter, query, or operation which specifies a range
of scales on which the operation is to be performed.

Thanks again for the comments.

----------------------------------------
From: "Edward Diener" <eldiener_at_[hidden]>
Sent: Sunday, June 05, 2011 8:18 AM
To: boost_at_[hidden]
Subject: Re: [boost] Gauging interest in new library re:
multiscale/multigranular containers

On 6/4/2011 4:55 PM, nathaniel_at_[hidden] wrote:
>
> Hi: I would like to submit a library which would allow functionality
> related to scale and granularity to be performed on STL-like containers.
> Until recently I was in a doctoral program specializing in the
philosophy
> of science, and I am especially interested in multiscale data modeling
and
> its applications to e.g. GUI design, code generation, decision
procedures,
> and DSL implementation. For example, one of my projects involves a DSL
> implemented through multiscale graphs in lieu of conventional syntax
trees.
> I used to study at Buffalo, which houses the National Center for
> Ontological Research -- headed by the quite broad-ranging scholar Barry
> Smith -- and has done pioneering work in biomedical ontology and also in
> ontological models related to complex or "vague" aggregative structures.
> In these structures collections of objects of some base type T can be
> defined as aggregates but with varying degrees of looseness or
"individual
> coherence"; and this aggregation can be iterated to produce multiscaled
> structures which can be queried or represented at different
granularities.
> There are a number of analytic procedures which can be implemented on
such
> structures, some overlapping with graph theoretic or statistical
methods.
> These structures also possess alot of internal structure above and
beyond
> their underlying list of values, so they are suited to memory-optimized
C++
> code rather than higher-level languages like Java. However, I am not
aware
> of generic libraries which really incorporate these kind of data
structures
> and their analysis in a systematic way, although I have come across
> non-template libraries which implement some similar features. What I
> propose is a library which could be used in a fashion similar to
> std::vector or std::set but which support additional functions along the
> lines of, e.g., building multigraphs out of T-type nodes; partitions of
> T-sets with functions to define or measure e.g. intrapartition
relevance;
> defining (or extending) a scale metric on T-objects, T-sets, T-arrays,
or
> other "T- data structures" and using it for rescaling, scale-filter, and
> other scale-related transformations; defining iterators or "walks" on
> T-collections which take scale and granularity into account; etc. In a
> number of cases the algorithms for these kinds of operations have been
> described in theoretical form or even already implemented in languages
like
> Scheme.

You generally need to define "scale" and "granularity" in mathematical
terms for computer programmers, else you are just interesting scientists
in our particular area.

Also would it not be possible to create a class which consists of an STL
container and some other objects which represent "scale" and
"granularity" and add the necessary functionality to that class rather
than attempting to re-engineer some STL container ?

_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk