[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Axiom-developer] documentation and the crystal
From: |
root |
Subject: |
Re: [Axiom-developer] documentation and the crystal |
Date: |
Wed, 31 Dec 2003 02:53:05 -0500 |
>a) To me its not even clear how to structurize mathematics. One cxan
>build mathematics on sets, hence the Bourbaki approch, or even better (in
>my eyes, but equivalent in formal strength) on the *function first*
>principle. What to choose for axiom? (In fact the set approch is build-in)
Well, I am assuming that we're only trying to structure "computational"
mathematics. That is, we're only trying to figure out how to organize
the pieces of mathematics we can compute. The NIST organization in the
U.S. government did a similar organization of information for numerical
mathematics years ago. Thus you can now find a specific index to classify
a numeric routine that will do Runga-Kutta integration on a certain class
of functions. A second index will find routines that do Simpson integration.
Since we've limited the mathematics to computational forms we should be
able to collect algorithms and classify them just as NIST has already done.
Thus we would find different classifications for Clifford-algebra algorithms
vs Hopf-algebra algorithms.
>b) I do not see how you can automatically assign semantics to data
>strutures etc. I think, one has at least to have a sort of *semantic
>typing* during the documentation of axiom (code). Hence every piece of
>documentation should come with a semantic type (or multiple such types)
>which finally allow to put a direction into the crystal looking glass.
Indeed assigning semantics is hard. Three approaches leap to mind.
The first is by keyword assignment in pamphlet files. This is weak but
easy (although time consuming). The second is to use the compiler to
try to parse the mathematical expressions in the tex file and assign
meaning (types) to the symbols. This is hard but might be helped
along if we use OpenMath representations of the mathematics. I'm not
sure if OpenMath is strong enough to handle general mathematical
expressions. The third is to use a chart-parser and semantic network
software to try to read and classify the mathematics.
I did an effort similar to the third case while at IBM. We built a
system called "Susan" which read english-language email, parsed it
using a chart-parser, constructed a semantic "concept", classified it,
and used the nearest-neighbor concepts to help direct further parsing.
Eventually the whole email and it's paragraphs, sentences, and phrases
became concepts. Once the email was classified (e.g. does it set up a
meeting? does it require an answer? does it have a deadline? is it from
someone important?) the email was assigned to a "basket". So email with
a "deadline" went into a "tickle file" that would begin reminding you
of the deadline several days in advance.
We could build a Susan-like system that could handle some portion of
the mathematics because we have the advantage of working in a limited
domain. Since we know the domain we can prepopulate the semantic
network with concepts. These concepts can then be used to direct the
chart-parser toward a correct parse of a sentence. This would be great
fun but falls under the "real research" category and thus will never
get funded :-)
>c) Algorithms should be plain and readable and not only be available in
>code form (if even this way)
I absolutely agree. Indeed I would hope to see several versions of
explanation for algorithms. My current "best practices" example is
"Primes is in P" by Agarwal, et. al.
(http://www.cse.iitk.ac.in/news/primality.pdf) They have the closest
thing to a literate program that embodies the kind of information I
hope to see. They show the theory, the pseudocode, the bounding
conditions, the proof, a complexity analysis and references. I only
wish they had reduced the pseudocode to real code and published the
original tex. Other than that I feel they've "set the standard" for a
good computational mathematics paper.
>d) To me it would be much more natural to look at the documentation like a
>big database and ask SQL like questions. EG:
>
>> select from AXIOM algorithm where domain has commutative;
>
>or such. And then get a sort of (semantic? web?) document which allows to
>go deeper into an algorithm, eg, thet should be lionks to al faces of the
>crystal which make sense to look at.
I agree that a large database will eventually underlie the whole of the
system. I'm not sure that a relational database is an entirely useful
model. Yes, it is complete but it is a very awkward way to structure
mathematical questions. A semantic network concept would allow you to
"fill in" as much as you know about something (e.g. domain, commutative),
"classify it", and then find all of the concepts that it dominates.
Some of these concepts would be the domains you want.
>e) The system will not be more smart than its designers / users. I do not
>see how an automated method will derive anything beyong mere syntactical
>sorting. To be frak I have no idea how to reach the above needs.
We're not trying to make it smarter we're trying to make it useful.
At the moment Axiom is just a huge pile that is nearly inert. It
includes primitive query functions but little else. If you are going
to use a large system (scale Axiom by 100x) you need to be able to
move around it in much more fluid ways. As a researcher you sometimes
wander into a library and browse thru books that might be related to
some half-formed idea. Crystal is an attempt to bring "wandering" into
the computer age.
Crystal should do at least 3 things. First, it should richly classify
and cross-connect the various sources of information (source code, an
algorithm, proofs of that algorithm, a complexity analysis, pointers
to related algorithms, the domain of application of the algorithm,
required inputs and their types, explanations of the theory behind the
algorithm, published results using the algorithm, domains which use
the algorithm, boundary test cases, examples of its use, etc. each
of which could be a kind of facet).
Second, Crystal should watch how you wander and work to taylor it's
answers to your interests. If I am looking at information on
cryptopgraphy and group theory I don't want to know anything about the
weakly related concepts (e.g. the probabilities in threat
matricies). I would like the system to remember what I've looked at
(so I can find it again), to remember how I got there (so I can start
from some point and go off in a different path), and to "suggest"
(nearest neighbor) things that are related but I didn't know (or
forgot) to look for. And I'd like a facet automatically created which
is related to the interest and possibly other facets related to the
details. After all, if I'm trying to write a paper I'm usually piling
up a bunch of related things I need as background. Computers are good
at keeping track of things and we need to organize the tracks.
Third, it should be capable of finding things in the stream of
computational mathematics literature that I would find "interesting".
I'm assuming that computational mathematics will eventually be online
and organized in some computer reachable way. Hopefully as literate
programs. So Crystal ought to be able to constantly be looking thru
all of the recently published papers for anything that fits my idea of
interesting. Trivially this involves a keyword search of papers that
get published but could be much more complex. In this way I can build
up a local library of references that make my Crystal searches more
interesting. And since they are literate programs I'm also building
up the mathematics for my areas of interest.
We're capable of doing portions of the first idea with the current
tools but it requires people who write code to make an effort to
make their work more accessible to the machine. We don't even have
a standard outline of what should be in a literate program paper yet.
We need to build a few papers, build some technology to exploit the
papers, rebuild the papers and the technology, etc. until we get to
the point where we find the system useful.
>f) As a probably managable project, it would be of utmost importance for
>me (and other mathematically interested, who are stupid programers) to
>have just the functionality described above for teh algebra lattice. I had
>such an email with David Mentre, where some of the needs were discussed.
In the very short term (6 months?) we're likely to have the algebra
lattice available. David has discussed various ways of displaying such
a beast (it has 1100+ nodes at the top level). I'm looking at ways of
building a computational math classification based on test cases from
several different computer algebra systems (CATS). The combination of
the machine-built algebra lattice and the hand-built classification
system should give us a simple prototype to play with. We need a
simple browser tool. I looked at leo which is an open source literate
browser but haven't decided how to leverage it yet. We need a backend
(database/semantic network). If I can remember the insight from KROPS
perhaps I can reproduce it.
>g) I do not think that *graph* is the best thing to have. One would need a
>sort of *matroid*, ie. a mathematical sound structure which keeps trak of
>all sorts of *dependencies* (assuming that independent objects/data are
>unrelated) Matroids allow you to keep trak of a minimal set of relations
>between objects etc. (You may think of a combinatorial geometry, where the
>points/lines/... may or may not be related.
> It might be foolish just to have nods and edges, but one hast to
>have an n-dimesnional structure of nodes, edges, faces, volumes, ... where
>the semantic meaning could be attached to the *dimensionality* of the
>object. This would allow to trace up and down iinto the complexity of teh
>system if eg. code has teh smallest dimensionality (say 1 dim), algorithms
>a hiher (say 3-dim (to let space for future enhancements)) and
>documentation of algorithms or a proof it works etc even higher such
>dimenionality. A *browser* would offer to display or hide the complexity
>away from a user or show it to him.
I like the idea of dimensionality (faces, volumes). I don't believe I've
ever seen such an idea applied to a semantic network before. In a semantic
network "concepts" (data structures) tend to be very near each other
(counting links) if they are "similar". Semantic networks tend to have
"clusters" of concepts that express a particular "idea". These clusters
form because an idea has many similar ways of being expressed so there
are many links. Different "ideas" will have relatively fewer links
between them. The closest connection to dimensionality is probably that
a "volume" can be a "open ball" formed around a cluster of concepts.
If you measure the number of links that cross the boundary of the
volume and minimize that number you can count the number of "concepts"
inside the "volume". Thus you can have "big ideas" (a really rich
cluster of concepts) and "small ideas" (a small cluster).
I'm unfamiliar with the matriod idea. What is it? An n-dimensional
spreadsheet?
> Say a novice user will be confused to see all details but needs
>quite practical help and may be even examples (like in the book)
I expect novice users to spend most of their time in the examples facets
so we should probably be sure they work :-)
>Sorry for not beeing as structures as you were, but I had no time to think
>over Christmas ;-))
Every so often I try to look toward the 30 year horizon to decide if we're
building for the future. I was too sick to type over christmas so I had a
lot of time on my hands to think. Now that I'm back to typing I'm trying
to "ground" the ideas in working code.
Shortly we'll all have access to several terabytes of disk, a terabyte of
storage, and a terahertz of cpu cycles connected to gigabyte bandwidth.
You'll be able to put virtually every paper ever published on a local
disk drive and search them in seconds. Imagine what will be possible
in 30 years. Libraries are doomed, books will be electronic. Browsing
will be incredibly painful if we don't get creative.
t