Thursday, March 10, 2005

Ontologies and the Design Process

I've been thinking a lot about ontologies and the role they play in developing a canon of thought governing Model-Driven Architectures and power tools for advanced pattern-based software composition and analysis.

Patterns and algorithms can be modeled as functions

Functions exist within ontologies

Composability of functions depends on the orthogonality of the ontology, which can be measured by the average height of a model (actually, I think there is an effect here which tracks in the transformed data of the model, not just the architectural height of it, since models could be infinitely recursive. The question might be, what is the transformation space of data made possible by all possible compositions of models. Is this a derivative ontology, or just a nth order derivative solution space?)

Ontologies can be sparse or freely composable or any degree in between. Sparse composition ontologies can be analyzed for chokepoints.

Any collection of models can be analyzed to discover an ontology that represents the expressive range of compositions which can be constructed from the models.

Ontologies can be compared by examining their axes of freedom and the cardinality of the axes.

-----------------------------------------------------------

A good synonym for ontology might be paradigm. An ontology is a framework within which concept models can be constructed. A paradigm is a point of view, or way of thinking about things.

-----------------------------------------------------------

My current curiosity about ontologies as an instrument for implementing pattern-based design and model-model transformation is related to some thinking I was doing back in 1994. Back then I wasn’t familiar with the term ontology but I was concerned with the information density of what I thought of as design vocabularies.

What I was thinking about then, was a view of how the invention process worked, from idea genesis thru implementation. I visualized initial genesis as a single thing: an inspiration, or a collision of several thoughts to create The Idea. This idea is complete, in that it encompasses the entire concept but it is maximally vague. It is so abstract that many critical facets of the idea are unspecified and the whole is so divorced from the real world as to be not directly implementable.

From this prototypical idea, a tree of specialization descends for an arbitrary number of levels as the idea is broken down and refined and each portion defined with greater precision until finally a number of leaf nodes are reached which exist as tangible real-world implementations of their respective pieces of the idea.

Taken together, this set of leaves is equivalent to the orginal concept in terms of information content but with a much higher degree of clarity. However individually, each leaf contains almost no information. It has reversed the information:concreteness ratio of the original Idea node.

At the time I was thinking about this model, I was interested in exploring the differences between portability and repurposability. People talk vaguely about reusability in software but if inspected in light of the above model of idea genesis, reusability can be seen to have two different aspects. I borrowed the terms portability and repurposeability to differentiate these aspects.

Portability describes how well something can be taken from one context and used in another. In terms of our idea genesis tree, portability is a measure of how much of the top of the tree can be moved from one set of leaves to another different set of leaves. From this it is easy to see why little benefit is gained in many porting scenarios where an entire idea is moved from one context to another, but only the top few layers can be moved and the rest of the layers and all the leaves must be reimplemented.

Repurposeability describes how much benefit is gained when something built for one task is used to perform a different task. In terms of our idea genesis tree, this can be visualized by how much of the bottom of a tree can be moved from one idea tree to another. Since the further down the tree you go, the less content and more concreteness, it is easy to see that the larger the set of fragments that can be repurposed, the greater the benefit. If only leaf nodes can be repurposed, little is gained even though there are a large number of them. However, if several major sub-branches can be repurposed some significant gains can be realized because some actual content is being effectively reused.

I was thinking in Shannon terms: specificity and entropy. My core notion was that the root concept node of the tree had maximum completeness but minimum specificity. Zero entropy, but maximum vagueness. Conversely, a leaf node was concrete but highly entropic.

In my visualization of the tree however, I left out an important factor which is one of the major drivers in the development of design techniques: redundancy. The decomposition of an arbitrary idea into a concrete implementation almost always involves a high degree of redundancy in the tree. Repurposing is the technique most commonly applied to attempt to fold the tree and reduce the number of redundant branches. Developing the ability to measure redundancy in a particular idea genesis tree will give us a way to understand the effectiveness of Repurposing and compare it to the cost.

In the idea genesis tree model, each level is an ontology. Design is an exercise of expressing an idea in an ontology whose conceptual distance from the idea is very small and then mapping the idea thru one or more intermediate ontologies to arrive at a representation which is concrete (implemented).

Design typically proceeds as a combination of two major activities: requirement specification and construction. Requirement specification is essentially the act of bounding the solution space of the design. Construction is the composition of elements in some ontology to model a portion of the solution.

It’s interesting to view these two activities in light of the model of idea genesis. We can see more clearly how they interrelate and how we might create automated tools to assist with the process.

For starters, it is seldom the case that design and requirements are specified in the same ontology so determining whether a given design satisfies the requirements is more art than science. If requirements were given in the same ontology as the solution, or could be transformed to be so, we could perform a rigorous comparison.

Historically, the problem with systems that attempt to provide ontologies for formal specification and validation of design against specification is that any ontology sufficiently orthogonal and comprehensive to specify an arbitrary design is highly entropic. It lies very far down on the idea genesis tree and the activity of specification becomes as laborious and fraught with risk as solution construction itself; essentially doubling the time required to complete a given project.

If we could establish transformational relationships between a number of ontologies, we could specify requirements in the ontologies which provide the highest leverage and then normalize them all thru factoring and transformation into a single ontology (or domain-partitioned set of ontologies) shared in common with the solution for actual comparative validation. This would give a much higher-leverage approach.

In order to be able to do this, we need to be able to talk about some properties of ontologies. Concreteness (how far from implementation) we’ve already talked about but we also need expressiveness. Expressiveness could be broken down into a couple of properties that are easier to measure directly: Conceptual distance, axes of freedom, sparseness, portability and repurposeability.

No comments: