You are currently browsing the category archive for the ‘Turing Machine and Related Automata’ category.

A computer is a machine that manipulates data according to a set of instructions.

Hardware is a general term for the physical artifacts of a technology. It may also mean the physical components of a computer system, in the form of computer hardware.

Components directly attached to the motherboard include:

  • The central processing unit (CPU) performs most of the calculations which enable a computer to function, and is sometimes referred to as the “brain” of the computer. It is usually cooled by a heat sink and fan.
  • The chip set mediates communication between the CPU and the other components of the system, including main memory.
  • RAM Stores all running processes (applications) and the current running OS. RAM Stands for Random Access Memory
  • The BIOS includes boot firmware and power management. The Basic Input Output System tasks are handled by operating system drivers.

The Central Processing Unit (CPU) or the processor is the portion of a computer system that carries out the instructions of a computer program, and is the primary element carrying out the computer’s functions. This term has been in use in the computer industry at least since the early 1960s. The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation remains much the same.

See http://www.flickr.com/photos/biwook/153062645/

One typical CPU

Inspite of some different architectures the main principle of executing commands is similar for typical commands independent of the used high-level language. There is a formalism of the Turing machine behind those modern architectures, although they have some excellent features that make execution rather fast.

In CPU the high level commands are split into atomistic machine level instructions.

For example adding the registers 1 and 2 and placing the result in register 6 is encoded:

[  op  |  rs |  rt |  rd |shamt| funct]
    0     1     2     6     0     32     decimal
 000000 00001 00010 00110 00000 100000   binary

Because typical high-level instructions can be transformed into atomic formulas, it is possible to mimic different parts of any high level command step-by-step. There are only some exception to that rule (parallelism etc).

Therefore it is possible for a tool to split high level commands into symbolic trees, which atomic parts of it as its leafs. This observation leads to a theory of source code simulation. Although partial execution of any high-level command from any part of those programs seldom can be complete, it is possible by partial evaluation to retrieve useful information from those commands to the program comprehension purpose in order to understand relations and arguments of any relevant command.

Symbolic Abstract Machine

Summary

As a conclusion in our study we were able to create a simulator for source code, which mimics CPU, but in a high abstraction level. The only components and artifacts for this work are symbols, which have been implemented as hybrid objects (see AHO). The user can create mental models by traversing information between symbols.

Mathematics is the study of quantity, structure, space, and change. Mathematicians seek out patterns, formulate new conjectures, and establish truth by rigorous deduction from appropriately chosen axioms and definitions.

The evolution of mathematics might be seen as an ever-increasing series of abstractions, or alternatively an expansion of subject matter.

From these areas discrete mathematics is closest to computer languages and source code analysis.

Discrete mathematics

Discrete mathematics is the common name for the fields of mathematics most generally useful in theoretical computer science. This includes, on the computer science side, computability theory, computational complexity theory, and information theory. Computability theory examines the limitations of various theoretical models of the computer, including the most powerful known model – the Turing machine. Complexity theory is the study of tractability by computer; some problems, although theoretically solvable by computer, are so expensive in terms of time or space that solving them is likely to remain practically unfeasible, even with rapid advance of computer hardware. Finally, information theory is concerned with the amount of data that can be stored on a given medium, and hence deals with concepts such as compression and entropy.

Typical kinds of discrete mathematics are shown:

Relations between Discrete Mathematics and Symbolic Analysis

In fact, symbolic analysis is one part of mathematics (in Finnish Ruohonen). However, that kind of mathematical symbolic analysis is one kind of analysis principle typical for mathematics like symbolic differentiation, symbolic integration etc. That kind of features can be found in modern mathematical tools like Mathematica 7.

From the software point-of-view, in Symbolic Analysis (Laitila) there is an atomistic model to describe source code and its behavior. That models obey the pure rules of graph theory, where all tools intented for graphs are useful.  Furthermore, the theory for simulating these graph elements (see AHO-objects, e.g symbols) is pure theory of computation. Simulating branches with unknown conditional values lead to a problem of combinations, a state explosion (combinatorial explosion) problem. However, there is no connections from the atomistic symbolic model to cryptography.

In summary, the framework of the atomistic model is rather close to the theory of discrete mathematics.

As a conculusion, it is then reasonable to ask whether there is a gap between mathematical formulation like albebra and formalism adapted from source code (programming languages) to be expressed in the symbolic atomistic model.

There is no Gap

Mathematics is a set of theories based on their type systems, where selected symbols are connected with carriers (symbolic clauses) and constants as operations (see figure below).

In programming languages the operations are expressions in the grammar (see Java grammar). From the side of automata theory each operation has been executed by a register machine or a similar automaton, which is a subset of the universal machine (see more).

Modern computer languages contain the features to allow programming any mathematical functions (except some demanding, very specific vector and array operations).

They are Turing strong (see more).  The Church–Turing thesis states that any function that is computable by an algorithm is a computable function.

Progamming languages are an extension fo traditional mathematics where there are definitions, loops, conditionals (e.g. paths) and some other types of clauses. Formalism for Pascal has been proved to be downwards compatible to mathematics for decades ago, and deducible.

In book Symbolic Analysis we show that Java can partially be simulated by SAM. It can create an output tape (Turing machine) for any algorithm even though some symbols are unknown. Therefore execution of the Symbolic Abstract Machine, SAM, can directly be mapped from the mathematical side e.g. equations to the software side symbols. There is no gap between these two sides, because all the semantics can be expressed in Symbolic language and all symbols will be executed by Symbol:run-invocation.

Some links:

A Propositional directed acyclic graph (PDAG) is a data structure that is used to represent a Boolean function. A Boolean function can be represented as a rooted, directed acyclic graph of the following form:

  • Leaves are labeled with \top (true), \bot (false), or a Boolean variable.
  • Non-leaves are \mathcal{4} (logical and), \mathcal{5} (logical or) and \Diamond (logical not).
  • \mathcal{4}– and \mathcal{5}-nodes have at least one child.
  • \Diamond-nodes have exactly one child.

Leaves labeled with \top (\bot) represent the constant Boolean function which always evaluates to 1 (0). A leaf labeled with a Boolean variable x is interpreted as the assignment x = 1, i.e. it represents the Boolean function which evaluates to 1 if and only if x = 1. The Boolean function represented by a \mathcal{4}-node is the one that evaluates to 1, if and only if the Boolean function of all its children evaluate to 1. Similarly, a \mathcal{5}-node represents the Boolean function that evaluates to 1, if and only if the Boolean function of at least one child evaluates to 1. Finally, a \Diamond-node represents the complemenatary Boolean function its child, i.e. the one that evaluates to 1, if and only if the Boolean function of its child evaluates to 0.

Every binary decision diagram (BDD) and every negation normal form (NNF) is also a PDAG with some particular properties. The following pictures represent the Boolean function f(x1,x2,x3) = − x1 * − x2 * − x3 + x1 * x2 + x2 * x3. See picture below for that.


The Correspondence between PDAG and AHO (Atomistic Hybrid Object)

The semantics of PDAG resembles the one of AHO the symbolic element in the atomistic model. PDAG is an automaton of type three in the Chomsky hieararchy.

Grammar Languages Automaton Production rules (constraints)
Type-0 Recursively enumerable Turing machine \alpha \rightarrow \beta (no restrictions)
Type-1 Context-sensitive Linear-bounded non-deterministic Turing machine \alpha A\beta \rightarrow \alpha\gamma\beta
Type-2 Context-free Non-deterministic pushdown automaton A \rightarrow \gamma
Type-3 Regular Finite state automaton A \rightarrow a
and
A \rightarrow aB

However, the command of AHO is more powerful, it can express any clause adapted from source code (loopClause is any loop, pathClause is any branch, setClause is any change of any variable etc) when splitted into sub-commands. There is one characteristic feature when describing mathematical calculations or non-trivial logical equations. Each AHO should be atomistic. The command of each AHO for calculations, logical operations and transformations is opClause. It refers to an operation.

The semantics of the command covers all levels of the Chomsky hieararchy. However, simulating levels 0 and 1 needs some arrangements. See more about the limitations from the book.

Some links:

Java Pathfinder is a system to verify executable Java bytecode programs. In its basic form, it is a Java Virtual Machine (JVM) that is used as an explicit state software model checker developed by NASA.

An Example about Path Finding

Below a small example for detecting a possible bug (assert) in a control flow:

int x, y;

if (x > y) {
x = x + y;
y = x – y;
x = x – y;
if (x > y)
assert false;
}
Below its logic is drawn:

Java Path Finder is able to follow on paths and to evaluate expressions using symbolic evaluation.

JPF is rather flexible

  • Symbolic execution
  • Can start at any point in the program
  • Can use mixed symbolic and concrete inputs
  • No special test driver needed – sufficient to have an executable program that uses the method/code under test

  • Any time symbolic execution
  • Use specialized listener to monitor concrete execution and trigger symbolic execution based on certain conditions
  • Unit level analysis in realistic contexts
  • Use concrete system-level execution to set-up environment for unit-level symbolic analysis
  • Applications:
  • Exercise deep system executions
  • Extend/modify existing tests: e.g. test sequence generation for Java containers

Inside JPF

JPF is an extension to  JVM. JPF Listener is able to listen code execution.

Symbolic Analysis (SAM) vs. JPF

SAM has not initially been planned for evaluating paths, because its main function is deterministic. However, its symbolic model allows traversing multiple paths at the same time. In that nondeterm mode, it should write its intermediate results to a tape tree, which is a modification from a standard trace (see Turing machine: tape).

Starting simulation via SAM is as flexible as JPF.

Other tools

MoonWalker is a model checker developed on the Mono platform. MoonWalker is a program to automatically detect errors in CIL bytecode programs, i.e. applications written for the .NET platform.

The current version of MoonWalker is able to find deadlocks and assertion violations in CIL programs, generated with Mono’s C# compiler.

Some links:

In computability theory the utm theorem (universal turing machine theorem), is a basic result about Gödel numberings of the set of computable functions. It proves the existence of a computable universal function which is capable of calculating any other computable function. The universal function is an abstract version of the universal turing machine, thus the name of the theorem.

According to the Church-Turing thesis, computable functions are exactly the functions that can be calculated using a mechanical calculation device given unlimited amounts of time and storage space. Equivalently, this thesis states that any function which has an algorithm is computable.

UTM theorem

Let \varphi be a Gödel numbering of the set of computable functions, then the partial function

u_\varphi: \mathbb{N}^2 \to \mathbb{N}

defined as

u_\varphi(i,x) := \varphi_i(x) \qquad i,x \in \mathbb{N}

is computable.

u_\varphi is called the universal function for \varphi

 

There are many equivalent ways to define the class of computable functions. For concreteness, the remainder of this article will assume that computable functions have been defined as those finitary partial functions on the natural numbers that can be calculated by a Turing machine. There are many equivalent models of computation that define the same class of computable functions. These models of computation include

and others.

Each computable function f takes a fixed number of natural numbers as arguments. Because the functions are partial, they may not be defined for every possible choice of input. If a computable function is defined then it returns a single natural number as output (this output can be interpreted as a list of numbers using a pairing function). These functions are also called partial recursive functions. In computability theory, the domain of a function is taken to be the set of all inputs for which the function is defined.

A function which is defined for all arguments is called total. If a computable function is total, it is called a total computable function or total recursive function.

The notation  f(x_1,\ldots,x_k) \downarrow indicates that the partial function f is defined on arguments x_1,\ldots,x_k, and the notation f(x_1,\ldots,x_k) \downarrow = y indicates that f is defined on the arguments x_1,\ldots,x_k and the value returned is y.

Applying the UTM Theorem for Symbolic Analysis

Symbolic Analysis is defined using GrammarWare, ModelWare, SimulationWare and KnowledgeWare.

Hence, the command-predicate of the Atomistic Hybrid Object (AHO)which corresponds symbol in the architecture – is the universal function for the symbol. It has been defined as a clause in the semantics of the Symbolic language. Hence, we use as our model of computation Prolog and the inference engine inside it:

AHO_Architecture

For each Java grammar term or C++ term there is a clause type and a subtype in the Symbolic language.  It is then the specification for SimulationWare considering that original grammar term.

Some links:

  • Rogers, H. (1987) [1967]. The Theory of Recursive Functions and Effective Computability. First MIT press paperback edition. ISBN 0-262-68052-1.
  • Soare, R. (1987). Recursively enumerable sets and degrees. Perspectives in Mathematical Logic. Springer-Verlag. ISBN 3-540-15299-7.

Erkki Laitila, PhD (2008) computer engineer (1977)