Research Progress
Database or Ontology?
Post: 2014-09-29 15:06  View:1533
I read somewhere that one of the hallmarks of creative people is their continual desire to 截取make lists of everything. Sigmund Freud would say something rather less complimentary, I suspect, but lists are very useful, especially when you go shopping - whether its for chips or new ideas. But like all tools, lists can be as limiting as they can be enlightening. The interface between biology and technology is ripe for lists, because few people know a lot about both these topics. But there are lists…and lists …
 
In this short article I compare database (the most commonly found type of list) with ontology (a tool for describing the relationships between things). Ontologies are widely used in biology and medicine-the largest one is the gene ontology, the product of an international research community that has had an enormous influence on bioinformatics and molecular genetics. This has led to the Human Phenotype Ontology that is used in medical diagnostics and computational analysis of phenotypes. Ultimately an ontology can be used for sophisticated numerical analysis using Bayesian statistics and semantic similarity. I am using the ontological approach to develop a computational reasoning tool for biomimetics which can be developed into a diagnostic tool and, possibly, an AI kernel for biomimetic robotics…eventually.
 
In essence, creativity largely relies upon finding new relationships between bits of knowledge that you already have. So the preference would be that its not just a list that you want, but a list that makes connections between items on the list. A database can do this to a limited extent, recording relationships in a hierarchical manner arranging the data into branching tree structures. But a database cannot deal very well with partial data; it relies on a closed world logic. If something is not stated to exist then it is assumed not to exist, which in turn implies that a database assumes that the data available is complete. It also means that a database will not suggest relationships which might exist but for which it hasnt got complete information. By contrast, ontology uses an open world logic which makes no assumptions about data which is not presented, and this of course makes it easier to add information. Probably more important is that an ontology is a tool for arranging and interrelating ideas and meanings in a network (rather than a tree), using formal semantics. These relationships can be applied to embedded data so that the instances which might appear in a database can be interrogated, arranged and rearranged in a variety of ways. This plasticity and openness to change is difficult to implement in a database, which requires the purpose and scope to be known and defined before it is populated with information. Thus a database has always to be started from scratch and its structure is rarely re-usable. An ontology is inherently re-usable, and not only can the relationships between the ideas be refined easily, but the data can be stripped out and replaced. This makes it easy to share ontological structures, many of which are available on the internet (the Gene Ontology is the prime example), leading to widespread standardisation of expression and the integration of ideas and data at a community level, and makes joining such a community attractive since quite a lot of the basic effort of providing a framework for the integration of data will have already been done. The most useful aspect of an ontology is that it is possible, using a theory-proving reasoner, to derive new information from the implicit relationships which are embedded in the ontology during its construction. This is because an ontology has more semantic power and freedom of expression. Freedom can be dangerous, so it is necessary to check the integrity of the ontology, but tools are available to do this.  Ultimately it is easy to generate and use an ontology with relatively little experience in computing.
 
It would be very difficult to generate an ontology by writing the raw code, so an editor is used. There are several editors, but probably the easiest to use is Protégé, Java-based and free to download from Stanford University (http://protege.stanford.edu). There are two types of Protégé - Frames (Protégé 3.x) and OWL (Protégé 4.x). I prefer Protégé 4.x which is largely the work of the University of Manchester in the UK. The embedded syntax is pretty strict, I always feel, but that of course means its more difficult to make a mistake! OWL, surprisingly, stands for "Web Ontology Language" from the World Wide Web Consortium.  An OWL ontology consists of Individuals, Properties and Classes. Individuals are grouped into Classes, and both can be related by Properties.   It has a rich set of operators - e.g. intersection, union and negation - which makes it possible for concepts to be defined as well as described. Complex concepts can thus be built up from definitions of simpler concepts. Furthermore, the logical model allows the use of a reasoner that can check the mutual consistence of the statements and can recognise which concepts fit under which definitions. The reasoner can therefore help to maintain the hierarchy correctly. This is particularly useful when dealing with cases where classes can have more than one parent.
 
But why do I want to use an ontology when nearly everyone else in biomimetics is using a database? The answer is crushingly simple - I tried using a database for what Im trying to do, and it didnt work! Rather than, as most databases seem to do, produce a list of biological effects which might be, or can be, used in an engineering environment, I want to produce a system that can compare biology and engineering at a far more general level and provide an appropriate answer to problems in design or engineering which would be the answer that you would get from looking through a biology textbook. This has to be done at a descriptive level, so I need a system which has described engineering, to which I can add biology at the same level of definition and thus achieve some sort of parity and equivalence, ultimately enabling the direct replacement of an engineering concept with one from biology wherever it is appropriate. So I dont need just a method for innovation, I need integration as well.  My ontology is available for downloading at https://wiki.bath.ac.uk/display/OOB/ which also explains how the ontology is arranged and how it can be used.  Unfortunately it doesnt have a friendly opening screen so you are dropped straight into it. But I prefer things that way (at least at this stage of the exercise) since I am more interested in developing something that works rather than making something pretty. Pretty can come later!
 
My ontology is based on the Russian system of solving problems inventively - TRIZ.  You have met TRIZ in these pages with Nikolay Bogatyrev. It was when he and his wife worked with me that we developed ideas for integrating TRIZ and biology, and invented BioTRIZ. My ontology has been developed in the years since I retired, and so is an offshoot of that period of development. There are several ontologies that use TRIZ, and several studies which use an ontological approach to formalise aspects of TRIZ that are still a bit hazy. I have stolen a part of TRIZ - the "Contradiction Matrix" and the associated Inventive Principles - to represent a codification of engineering practise. Of course there is no guarantee that the engineering we do is best practise, so bringing biology into the mix also acts as a test of the efficacy of engineering. It isnt always the best (as we know)! The underlying concept is simple and well known.  Just as TRIZ was initiated and developed by examining a large number of technical patents, so biology is providing many "patents of nature". All the information is taken from published research papers, so they have been examined by experts in their own area before being published, and can be said to have some authority.  Even so, very few papers in biology state a problem and go on to say how biology solved that problem. Many of then report a new physiological or behavioural phenomenon, and give no indication of the advantage that phenomenon might yield to the organism.  So I have adopted the technique of looking for key words - the basic one being "optimisation". This commonly gives me a study where there are two or more variables somewhat at odds with each other. This falls in with the classical definition of a problem, first mooted by Heraclitus in Ancient Greece, and more recently by the philosopher Hegel. It was the teaching of Hegelian philosophy in Russian schools that led to the methods developed in TRIZ. So I am going down a well-worn path, albeit with new partners. With the two opposing characteristics requiring optimisation I can then see what biology offered as the resolution - and I have codified some biology in the same way as TRIZ, and have my comparison with engineering. Added in to this mix is a large amount of biology, physiology, taxonomy and morphology.  Ultimately I shall include more on the range of biological effects which bring about the changes I am documenting. 
 
Perhaps you can help me? Go to my wiki page and begin!
 
•Consider using an Ontology when the schema is large and/or complex and when its not possible/reasonable to assume complete information.
 
•Consider using a DataBase when the schema is small and/or simple and complete information is available.
Address: C508 Dingxin Building, Jilin University, 2699 Qianjin Street, Changchun 130012, P. R. China
Copyright © 2024 International Society of Bionic Engineering All Rights Reserved
吉ICP备11002416号-1