You are currently browsing the category archive for the ‘Software ontology’ category.

There is an attractive presentation about presenting knowledge from the various roles like practicians, software people, and scientists: How to tell stuff to the computer. It describes a triangle (see below), whose corners are practical domain knowledge (lower left corner), software artifacts (top corner) and science (low right corner). The picture proposes some technologies inside the triangle. The most important things in the triangle are – in writer’s opininion – the steps in the lines connecting corners.

Triangle of Knowledge Representation (KR, see http://www.lisperati.com).

In his/hers conclusion the writer forecasts that in future there is a revolution, caused by descriptive logics (see http://www.lisperati.com/tellstuff/conclusion.html). I warmly agree that conclusion, because logic has a very strong role in the framework of symbolic analysis.

Instead, it is difficult to see what is the beef in the desriptive logic here: http://www.lisperati.com/tellstuff/dl.html, the text contains traditional monolitic Lisp. However, the idea of the title: Marriage of Logic and Objects is a very good vision. I have had the same goal in the architecture of AHO hybrid objects. Furthermore, there is a solid contact surface between semantic web and symbolic analysis (see more).

Symbolic and holistic approach for estimating knowlege produced by the software

The triangle (above) is useful as a base for illlustrating software development and its knowledghe representation, too.  In the lower triangle (see below) I have named the corners respectively: Domain knowlege, source of program and information system (IS) pragmatics caused by the software.

Software Knowledge Representation (SKR). (Laitila 2010)

The last corner is not science as in the triangle about, but it simulates all purposes to understand the software and its value as an empiric product. The last corner is then an attempt to get empiric and practical research information from the implemented software. It is then a large approach. It has two sides:

  1. problem specific approach supported by reverse engineering and
  2. holistic approach in order to evaluate the whole

There some essential roles in the figure. All essential information is thought to be stored into an imagined megamodel (specification, resource information, sprints, tests etc).

The three lines are:

  1. The left line describes software development, to code.
  2. The line from top to the right lower corner is symbolic analysis containing the technology spaces: GrammarWare, ModelWare, SimulationWare and KnowledgeWare. For practical purposes there is a problem reasonign technology (PRT) close to the right corner.
  3. The bottom line is a problem, because there is no direct support for estimating how does a system satisfy all possible user needs, but there are some technologies to create end user services so that they can be mapped into code and remain visible in the system. SOA, aspects, Zachman architecture and metrics are some means for that purpose.

Some links:

Advertisements

There is a nice block by Robert MacIntosh intended for PhD students at: http://doctoralstudy.blogspot.com/2009/05/being-clear-about-methodology-ontology.html

He describes light at the end of the research tunnel.  There are some steps in the tunnel, forming a scientific framework for research people to follow:

  • Ontology … to do with our assumptions about how the world is made up and the nature of things
  • Epistemology … to do with our beliefs about how one might discover knowledge about the world
  • Methodology … to do with the tools and techniques of research

The author claims that ontology, epistemology and methodoly are three pillars of the thesis.


An extended framework with the applications for symbolic analysis

We define symbolic analysis as a framework (light in the tunnel) in 10 levels as follows:

  1. Ontology is a set of symbols as well as concepts made by the user.  Obs. Concepts are higher level symbols, non-grounded.
  2. Epistemology is a set of transformation rules for symbols, in order to get knowledge. They describe semantics of each symbol in the ontology.
  3. Paradigm is here symbolic analysis: how to describe ontology and epistemology and the theories and methods. Its “competitors” are static and dynamic analyses.
  4. Methodology is a set of theories how ontology will be transformed using epistemology to information, capable of expressing knowledge. There are theories for parsing, making a symbolic model, simulating the model etc.
  5. Method is any way to use the methodology in practice. Some methods are control flow analysis, making a call tree etc.
  6. Tool is a specific means to apply the method in practice. A tool can be any tool, which applies (here) symbolic execution or symbolic analysis, for example for simulating code.
  7. Activity is a human interaction intended for understanding code. Some activities are finding a bug, browsing code in order to understand some principles etc.
  8. Action is a piece of activity of activity, for example browsing items or selecting a view or making a hypothesis.
  9. Sub-action is a part of an action. Lowest sub-actions are primitives like reading an item, making a decision etc.
  10. Lowest level is practical data for the method, tool, activity, action and sub-action. In symbolic analysis practical data can be non-symbolic or symbolic. Non-symbolic data in a program can have any type of the type system of the original source code. Symbolic data can have at most any type in the ontology. It is then very much richer than the non-symbolic notation.

Using the levels 1-10 a complete conceptual framework for any programming language and any operating system  can be written. There are however, some limitations, how to reverse engineer different kinds of features of source code. In order to alleviate these problems/ shortcuts, symbolic analysis has its rather expressive format: each relation is expressed as a Prolog predicate, which can implicitely point to its neighbour symbols, even though there is no defintion for their semantics.

The levels 7-9 tie the framework into action theory, which is empiric research.

Some links

Mathematics is the study of quantity, structure, space, and change. Mathematicians seek out patterns, formulate new conjectures, and establish truth by rigorous deduction from appropriately chosen axioms and definitions.

The evolution of mathematics might be seen as an ever-increasing series of abstractions, or alternatively an expansion of subject matter.

From these areas discrete mathematics is closest to computer languages and source code analysis.

Discrete mathematics

Discrete mathematics is the common name for the fields of mathematics most generally useful in theoretical computer science. This includes, on the computer science side, computability theory, computational complexity theory, and information theory. Computability theory examines the limitations of various theoretical models of the computer, including the most powerful known model – the Turing machine. Complexity theory is the study of tractability by computer; some problems, although theoretically solvable by computer, are so expensive in terms of time or space that solving them is likely to remain practically unfeasible, even with rapid advance of computer hardware. Finally, information theory is concerned with the amount of data that can be stored on a given medium, and hence deals with concepts such as compression and entropy.

Typical kinds of discrete mathematics are shown:

Relations between Discrete Mathematics and Symbolic Analysis

In fact, symbolic analysis is one part of mathematics (in Finnish Ruohonen). However, that kind of mathematical symbolic analysis is one kind of analysis principle typical for mathematics like symbolic differentiation, symbolic integration etc. That kind of features can be found in modern mathematical tools like Mathematica 7.

From the software point-of-view, in Symbolic Analysis (Laitila) there is an atomistic model to describe source code and its behavior. That models obey the pure rules of graph theory, where all tools intented for graphs are useful.  Furthermore, the theory for simulating these graph elements (see AHO-objects, e.g symbols) is pure theory of computation. Simulating branches with unknown conditional values lead to a problem of combinations, a state explosion (combinatorial explosion) problem. However, there is no connections from the atomistic symbolic model to cryptography.

In summary, the framework of the atomistic model is rather close to the theory of discrete mathematics.

As a conculusion, it is then reasonable to ask whether there is a gap between mathematical formulation like albebra and formalism adapted from source code (programming languages) to be expressed in the symbolic atomistic model.

There is no Gap

Mathematics is a set of theories based on their type systems, where selected symbols are connected with carriers (symbolic clauses) and constants as operations (see figure below).

In programming languages the operations are expressions in the grammar (see Java grammar). From the side of automata theory each operation has been executed by a register machine or a similar automaton, which is a subset of the universal machine (see more).

Modern computer languages contain the features to allow programming any mathematical functions (except some demanding, very specific vector and array operations).

They are Turing strong (see more).  The Church–Turing thesis states that any function that is computable by an algorithm is a computable function.

Progamming languages are an extension fo traditional mathematics where there are definitions, loops, conditionals (e.g. paths) and some other types of clauses. Formalism for Pascal has been proved to be downwards compatible to mathematics for decades ago, and deducible.

In book Symbolic Analysis we show that Java can partially be simulated by SAM. It can create an output tape (Turing machine) for any algorithm even though some symbols are unknown. Therefore execution of the Symbolic Abstract Machine, SAM, can directly be mapped from the mathematical side e.g. equations to the software side symbols. There is no gap between these two sides, because all the semantics can be expressed in Symbolic language and all symbols will be executed by Symbol:run-invocation.

Some links:

I present here four (4) arguments on behalf of symbolic processing to be used as the core of core computer science:

  1. Symbols in GrammarWare: From the history of computing and computers we can rembember the main effort needed from Alan Turing in order to create a well-established foundation for computers. His breakthrough article was: On computable numbers.  However, in his paper the commands to control the tapes of the universal machine were symbols, not mere numbers. From the theory of grammars we can find that grammars are grouped by production rules, which define terms, which contain specific symbols: either terminals or non-terminals. Some of them are only syntactic, but most have a specific semantics behind them.
  2. Symbols in ModelWare: Modelware is a popular research area. Models are created from nodes and edges. Both of them are symbols. Edges are like predicates, combining symbols with each other.
  3. Symbols in SimulationWare: From the automata theory we can retrieve logic for state machines, described often using state diagrams. Each term captured from the code and modeled as nodes and edges is then a symbol, even though as an automaton it seems to be a state transition table, or a grid.
  4. Symbols in KnowledgeWare: The cognitive science is based on cognition, human thoughts using symbols and interpretations. Even though there is some critisism about the symbolic paradigm to be used in our every-day life, it is clear that understanding program knowledge can be formualated using symbols captured from code, if we can understand and evaluate models created from code and we can evaluate meaning of the symbols either in our mind or in a tool made by a computer.

These four technology spaces are the core of the core computer science, because

  • By using GrammarWare we can parse code and compile programs into executable systems.
  • By using ModelWare we can abstract concrete implementations, either code or specific user requirements.
  • By using SimulationWare we can execute code or simulate programs. We have these two approaches in the same space.
  • By using KnowledgeWare we can create concepts and contexts for our programs in advance before ModelWare and GrammarWare.  Furthermore, by using KnowledgeWare we can learn from programs, using SimulationWare to execute or simulate them. It is learning by doing: we can fix bugs etc.

A tentative picture illustrating a symbolic paradigm, where the concept of symbol is the center of computer science:

Symbol_As_A_Center

By extending the picture into four directions it is possible for anyone to extend his/hers comprehension about our discipline, rather systematically.

In order to create an ontology for any application for computer science, symbol is a must to be regardeds as a main element.

Some links:

 

 

 

 

From Wiki: A programming language is an artificial language designed to express computations that can be performed by a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine, to express algorithms precisely, or as a mode of human communication.

A programming language is a notation for writing programs, which are specifications of a computation or algorithm.[1] Some, but not all, authors restrict the term “programming language” to those languages that can express all possible algorithms.

All Turing complete languages can implement the same set of algorithms.

Some links:

  1. Language
  2. Programming paradigms (Wiki)

Definition:

A formal language L over an alphabet Σ is just a subset of Σ*, that is, a set of words over that alphabet.

Discussion

Formal languages are entirely syntactic in nature but may be given semantics that give meaning to the elements of the language. Formal languages are useful in logic because they have formulas that can be interpreted as expressing logical truths. An interpretation of a formal language is the assignment of meanings to its symbols and formulas.

Model theory is the theory of interpretations of formal languages. The study of interpretations is called formal semantics. Giving an interpretation is synonymous with constructing a model. A model of a formula of a formal language is an interpretation of the language for which the formula comes out to be true.

Some links:

  1. Formal semantics
  2. Interpretation (logic)
  3. Model theory

From Wiki:

A language is a system for encoding and decoding information. In its most common use, the term refers to so-called “natural languages” — the forms of communication considered peculiar to humankind.

Mathematics and computer science use artificial entities called formal languages (including programming languages and markup languages, and some that are more theoretical in nature). These often take the form of character strings, produced by a combination of formal grammar and semantics of arbitrary complexity.

A programming language is a formal language endowed with semantics that can be used to control the behavior of a machine, particularly a computer, to perform specific tasks. Programming languages are defined using syntactic and semantic rules, to determine structure and meaning respectively.

Programming languages are used to facilitate communication about the task of organizing and manipulating information, and to express algorithms precisely. Some authors restrict the term “programming language” to those languages that can express all possible algorithms; sometimes the term “computer language” is used for artificial languages that are more limited.

Links:

  1. Formal languages.
  2. Programming languages.

Computer software, or just software is a general term used to describe the role that computer programs, procedures and documentation play in a computer system.

Erkki Laitila, PhD (2008) computer engineer (1977)