You are currently browsing the category archive for the ‘SimulationWare’ category.

In computer science, symbolic execution (also symbolic evaluation) refers to the analysis of programs by tracking symbolic rather than actual values, a case of abstract interpretation. The field of symbolic simulation applies the same concept to hardware. Symbolic computation applies the concept to the analysis of mathematical expressions.

Symbolic execution is used to reason about all the inputs that take the same path through a program.

Symbolic execution is useful for software testing because it can analyse if and when errors in the code may occur. It can be used to predict what code statements do to specified inputs and outputs. It is also important for considering path traversal.

Symbolic execution is used to reason about a program path-by-path. This may be superior to reasoning about a program, like Dynamic program analysis does, input-by-input.

Symbolic execution vs Symbolic Analysis (Laitila, 2008)

Symbolic execution emphasizes execution, travering program paths.  Symbolic analysis has the same purpose, but furthermore, it is a formalism to run / execute symbols using their internal state automaton. The automaton can be programmed to do anything, which is characteristics for the symbol. One exellent feature of symbolic analysis is its reduncancy and the  internal semantics of each symbol because of its clause-notation in the Symbolic-language. It is possible to reconstruate parse trees from the symbols so that the side effects caused by any symbol can be matched with the corresponding symbols. This makes it possible to partially verify the code.

Some links:

Executable UML, often abbreviated to xtUML or xUML, is the evolution of the Shlaer-Mellor method to UML. Executable UML graphically specifies a system using a profile of the UML. The models are testable, and can be compiled into a less abstract programming language to target a specific implementation. Executable UML supports MDA through specification of platform-independent models, and the compilation of the platform-independent models into platform-specific models.

Executable UML is used to model the domains in a system. Each domain is defined at the level of abstraction of its subject matter independent of implementation concerns. The resulting system view is composed of a set of models represented by at least the following:

  • The domain chart provides a view of the domains in the system, and the dependencies between the domains.
  • The class diagram defines the classes and class associations for a domain.
  • The statechart diagram defines the states, events, and state transitions for a class or class instance.
  • The action language defines the actions or operations that perform processing on model elements.

Shortcomings of Executable UML

UML doesn’t define semantics of a programming language. Hence, it is not complete to describe behavior of any exact system. In order to correct this shortcoming some new principles are needed. Some of them are: automata theory and symbol-driven engineering.

Automata Theory and Symbol-Driven Engineering (SDE)

Automata theory defines behavior model for corresponding code elements. Therefore, this theory background  eliminates the gap between program behavior and the corresponding model, like UML

In SDE every symbol has an automaton behind it. In order to execute those symbols an invocation Symbol:run is needed plus some specific initailizations for some specific symbols like definitions (method invocations and  object and variable definitions). See figure below.

Some links:

In computer programming, program slicing is the computation of a program slice. The program slice consists of the parts of a program that may affect the values computed at some point of interest, referred to as a slicing criterion. Program slicing can be used in debugging to locate source of errors more easily. Other applications of slicing include software maintenance, optimization, program analysis, and information flow control.

Slicing techniques have been seeing a rapid development since the original definition by Mark Weiser. At first, slicing was only static, i.e., applied on the source code with no other information than the source code. Bogdan Korel and Janusz Laski introduced dynamic slicing, which works on a specific execution of the program (for a given execution trace). Other forms of slicing exist, for instance path slicing.

Slicing in Symbolic Analysis

Symbolic analysis is a method to traverse source code symbols by using a specific traversing algorithm, or by simulating the corresponding code. Actually symbolic analysis is static slicing, but it tries to mimich dynamic behavior, too.  As an output of simulation symbolic analysis creates side effect elements, which are execution results from symbols.

Below a rough principle about a slice generated from symbolic analysis.

Some links:

A computer is a machine that manipulates data according to a set of instructions.

Hardware is a general term for the physical artifacts of a technology. It may also mean the physical components of a computer system, in the form of computer hardware.

Components directly attached to the motherboard include:

  • The central processing unit (CPU) performs most of the calculations which enable a computer to function, and is sometimes referred to as the “brain” of the computer. It is usually cooled by a heat sink and fan.
  • The chip set mediates communication between the CPU and the other components of the system, including main memory.
  • RAM Stores all running processes (applications) and the current running OS. RAM Stands for Random Access Memory
  • The BIOS includes boot firmware and power management. The Basic Input Output System tasks are handled by operating system drivers.

The Central Processing Unit (CPU) or the processor is the portion of a computer system that carries out the instructions of a computer program, and is the primary element carrying out the computer’s functions. This term has been in use in the computer industry at least since the early 1960s. The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation remains much the same.

See http://www.flickr.com/photos/biwook/153062645/

One typical CPU

Inspite of some different architectures the main principle of executing commands is similar for typical commands independent of the used high-level language. There is a formalism of the Turing machine behind those modern architectures, although they have some excellent features that make execution rather fast.

In CPU the high level commands are split into atomistic machine level instructions.

For example adding the registers 1 and 2 and placing the result in register 6 is encoded:

[  op  |  rs |  rt |  rd |shamt| funct]
    0     1     2     6     0     32     decimal
 000000 00001 00010 00110 00000 100000   binary

Because typical high-level instructions can be transformed into atomic formulas, it is possible to mimic different parts of any high level command step-by-step. There are only some exception to that rule (parallelism etc).

Therefore it is possible for a tool to split high level commands into symbolic trees, which atomic parts of it as its leafs. This observation leads to a theory of source code simulation. Although partial execution of any high-level command from any part of those programs seldom can be complete, it is possible by partial evaluation to retrieve useful information from those commands to the program comprehension purpose in order to understand relations and arguments of any relevant command.

Symbolic Abstract Machine

Summary

As a conclusion in our study we were able to create a simulator for source code, which mimics CPU, but in a high abstraction level. The only components and artifacts for this work are symbols, which have been implemented as hybrid objects (see AHO). The user can create mental models by traversing information between symbols.

Gödel’s incompleteness theorems are two theorems of mathematical logic that state inherent limitations of all but the most trivial axiomatic systems for arithmetic.

Gödel’s first incompleteness theorem states that:

Any effectively generated theory capable of expressing elementary arithmetic cannot be both consistent and complete. In particular, for any consistent, effectively generated formal theory that proves certain basic arithmetic truths, there is an arithmetical statement that is true,[1] but not provable in the theory (Kleene 1967).

The liar paradox is the sentence “This sentence is false.

Gödel’s second incompleteness theorem can be stated as follows:

For any formal effectively generated theory T including basic arithmetical truths and also certain truths about formal provability, T includes a statement of its own consistency if and only if T is inconsistent.

Arithmetization of syntax

The main problem in fleshing out the proof described above is that it seems at first that to construct a statement p that is equivalent to “p cannot be proved”, p would have to somehow contain a reference to p, which could easily give rise to an infinite regress. Gödel’s ingenious trick, which was later used by Alan Turing in his work on the Entscheidungsproblem, is to represent statements as numbers, which is often called the arithmetization of syntax. This allows a self-referential formula to be constructed in a way that avoids any infinite regress of definitions.

To begin with, every formula or statement that can be formulated in our system gets a unique number, called its Gödel number. This is done in such a way that it is easy to mechanically convert back and forth between formulas and Gödel numbers. It is similar, for example, to the way English sentences are encoded as sequences (or “strings”) of numbers using ASCII: such a sequence is considered as a single (if potentially very large) number. Because our system is strong enough to reason about numbers, it is now also possible to reason about formulas within the system.

A formula F(x) that contains exactly one free variable x is called a statement form or class-sign. As soon as x is replaced by a specific number, the statement form turns into a bona fide statement, and it is then either provable in the system, or not. For certain formulas one can show that for every natural number n, F(n) is true if and only if it can be proven (the precise requirement in the original proof is weaker, but for the proof sketch this will suffice). In particular, this is true for every specific arithmetic operation between a finite number of natural numbers, such as “2*3=6”.

Statement forms themselves are not statements and therefore cannot be proved or disproved. But every statement form F(x) can be assigned with a Gödel number which we will denote by G(F). The choice of the free variable used in the form F(x) is not relevant to the assignment of the Gödel number G(F).

Now comes the trick: The notion of provability itself can also be encoded by Gödel numbers, in the following way. Since a proof is a list of statements which obey certain rules, we can define the Gödel number of a proof. Now, for every statement p, we may ask whether a number x is the Gödel number of its proof. The relation between the Gödel number of p and x, the Gödel number of its proof, is an arithmetical relation between two numbers. Therefore there is a statement form Bew(x) that uses this arithmetical relation to state that a Gödel number of a proof of x exists:

Bew(y) = ∃ x ( y is the Gödel number of a formula and x is the Gödel number of a proof of the formula encoded by y).

The name Bew means provable ( German word) . An important feature of the formula Bew(y) is that if a statement p is provable in the system then Bew(G(p)) is also provable. This is because any proof of p would have a corresponding Gödel number, the existence of which causes Bew(G(p)) to be satisfied.

Prolog creates Gödel numbers

Prolog’s type system uses Gödel numbers internally. Prolog uses the resolution theorem (by John Alan Robinson) for showing which clauses are satifsfiable.

In mathematical logic and automated theorem proving, resolution is a rule of inference leading to a refutation theorem-proving technique for sentences in propositional logic and first-order logic. In other words, iteratively applying the resolution rule in a suitable way allows for telling whether a propositional formula is satisfiable and for proving that a first-order formula is unsatisfiable. This method may prove the satisfiability of a first-order satisfiable formula, but not always, as it is the case for all methods for first-order logic.

The relation of the symbolic atomistic model for proving software and the incompleteness theorems

The atomistic model simulates code symbols as an automaton each. The output of the simulated clauses (side effcts) is not complete as itself, but it allows the user to find possible contradictions or logical errors in the code by connecting the symbols that are causing side effects the the output states  (the side effect elements).

Some links:

A Kripke structure is a type of nondeterministic finite state machine used in model checking to represent the behaviour of a system. It is basically a graph whose nodes represent the reachable states of the system and whose edges represent state transitions. A labeling function maps each node to a set of properties that hold in the corresponding state. Temporal logics are traditionally interpreted in terms of Kripke structures.

Let AP be a set of atomic propositions, i.e. boolean expressions over variables, constants and predicate symbols.

A Kripke structure is a 4-tuple M = (S,\; I,\; R,\; L) consisting of

  • a finite set of states S\;
  • a set of initial states I \subseteq S
  • a transition relation R \subseteq S \! \times \! S \; where \; \forall s \! \in \! S, \; \exist s^' \!\! \in \! S such that  (s,s^') \in R
  • a labeling (or interpretation) function L: S \rightarrow 2^{AP}

Since R is left-total, it is always possible to construct an infinite path through the Kripke structure. Thus a deadlock state has a single outgoing edge back to itself.

The labeling function L defines for each state sS the set L(s) of all atomic propositions that are valid in s. In the Kripke structure, the transition function should be complete,i.e each and every state should have transition.In other words there should not be any dead states in the graph.

Combining the Semantics of the Atomistic Model and the Kripke Structure with each other

If we use the notation of the Kripke structure for the Atomistic Model we get the definitions:

The atomistic model is a Kripke structure,  a 4-tuple M = (S,\; I,\; R,\; L) consisting of

  • a finite set of  states S\; implemented as symbols in the model illustrating clauses of the original clause and side effects caused by the simulation/evaluation processes.
  • a set of initial states I \subseteq S implemented as symbols of the models: either clauses or side effects.
  • a transition relation R \subseteq S \! \times \! S \; where \; \forall s \! \in \! S, \; \exist s^' \!\! \in \! S such that  (s,s^') \in R implemented as links of the symbols.
  • a labeling (or interpretation) function L: S \rightarrow 2^{AP} implemented as a command-fact in each symbol.

Summary about the correspondece

The atomistic model is downwards compatible with the Kripke structure. It leads to a conculsion (by inspection) that all functions and mathematical processes intended for the Kripke structure, e.g. model checking, can be borrowed for the use of the atomistic model. Only a simple transformation is needed.

Some links:

Understanding side effects is one of the most complex and challenging things of program comprehension.

In computer science, a function or expression or method or class or a package is said to have a side effect if, in addition to producing a value, it also modifies some state or has an observable interaction with calling functions or the outside world. For example, a function might modify a global or a static variable, modify one of its arguments, raise an exception, write data to a display or file, read data, or call other side-effecting functions. In the presence of side effects, a program’s behavior depends on history; that is, the order of evaluation matters. Because understanding an effectful program requires thinking about all possible histories, side effects often make a program harder to understand.

Side effects are essential to enable a program to interact with the outside world (people, filesystems, other computers on networks). But the degree to which side effects are used depends on the programming paradigm. Imperative programming is known for uncontrolled, promiscuous use of side effects. Thus, the languages Java, C++ and other imperative ones are problematic referring to the scope of this title.

How side effects can be captured in the analyzing paradigms

  • Static analysis cannot handle side effects, at all.
  • For dynamic analysis, selected clauses can be stored into a trace or an output.
  • For symbolic analysis (Laitila), every activity for every symbol can be traced and stored, if needed.

The output of symbolic analysis is the most accurate trace

If symbol S activates symbol S1 in order to evaluate whether it is true or not, an side effect symbol SE1 can be created every time when S1 is evaluated. Hence, after execution the output trace is S • S1 • SE1.

That if-condition was only one small example, but by extending the same principle for any statement and for any evaluation, it is possible to get detailed information about complex algorithms and separated threads, almost any typical clause of the code. However, there are some restrictions about whether it is possible to evaluate a statement (clause) or not. See more about  computability conditions.

Summary about side effects

Symbolic analysis is the most accurate paradigm to save evaluation results: all or selectively any category of  computations.

By using side effect elements it is possible for the user, to verify some specific parts of the program, piece by piece,  if only it is possible for him/her to formulate the necessary input preconditons (prerequisities) for the code.  In some cases verifying program parts is still rather challenging manually. In those cases, it is recommended to load the output(s) of the symbolic analysis into any commercial or free graph tool, e.g. pert graph tool,  a visualization tool, Excel or similar. Those generic tools are helpful in finding anomalies or general rules based on the symbols and side effects.

 

 

 

 

 

In computability theory the utm theorem (universal turing machine theorem), is a basic result about Gödel numberings of the set of computable functions. It proves the existence of a computable universal function which is capable of calculating any other computable function. The universal function is an abstract version of the universal turing machine, thus the name of the theorem.

According to the Church-Turing thesis, computable functions are exactly the functions that can be calculated using a mechanical calculation device given unlimited amounts of time and storage space. Equivalently, this thesis states that any function which has an algorithm is computable.

UTM theorem

Let \varphi be a Gödel numbering of the set of computable functions, then the partial function

u_\varphi: \mathbb{N}^2 \to \mathbb{N}

defined as

u_\varphi(i,x) := \varphi_i(x) \qquad i,x \in \mathbb{N}

is computable.

u_\varphi is called the universal function for \varphi

 

There are many equivalent ways to define the class of computable functions. For concreteness, the remainder of this article will assume that computable functions have been defined as those finitary partial functions on the natural numbers that can be calculated by a Turing machine. There are many equivalent models of computation that define the same class of computable functions. These models of computation include

and others.

Each computable function f takes a fixed number of natural numbers as arguments. Because the functions are partial, they may not be defined for every possible choice of input. If a computable function is defined then it returns a single natural number as output (this output can be interpreted as a list of numbers using a pairing function). These functions are also called partial recursive functions. In computability theory, the domain of a function is taken to be the set of all inputs for which the function is defined.

A function which is defined for all arguments is called total. If a computable function is total, it is called a total computable function or total recursive function.

The notation  f(x_1,\ldots,x_k) \downarrow indicates that the partial function f is defined on arguments x_1,\ldots,x_k, and the notation f(x_1,\ldots,x_k) \downarrow = y indicates that f is defined on the arguments x_1,\ldots,x_k and the value returned is y.

Applying the UTM Theorem for Symbolic Analysis

Symbolic Analysis is defined using GrammarWare, ModelWare, SimulationWare and KnowledgeWare.

Hence, the command-predicate of the Atomistic Hybrid Object (AHO)which corresponds symbol in the architecture – is the universal function for the symbol. It has been defined as a clause in the semantics of the Symbolic language. Hence, we use as our model of computation Prolog and the inference engine inside it:

AHO_Architecture

For each Java grammar term or C++ term there is a clause type and a subtype in the Symbolic language.  It is then the specification for SimulationWare considering that original grammar term.

Some links:

  • Rogers, H. (1987) [1967]. The Theory of Recursive Functions and Effective Computability. First MIT press paperback edition. ISBN 0-262-68052-1.
  • Soare, R. (1987). Recursively enumerable sets and degrees. Perspectives in Mathematical Logic. Springer-Verlag. ISBN 3-540-15299-7.

Universal Turing machine (UTM) is a Turing machine that can simulate an arbitrary Turing machine on arbitrary input. The universal machine essentially achieves this by reading both the description of machine to be simulated as well as the input thereof from its own tape.  This model is considered by some to be the origin of the stored program computer—used by John von Neumann for the “Electronic Computing Instrument” that now bears von Neumann’s name: the von Neumann architecture.

Universal machine is also known as universal computing machine, universal machine, machine U, U.

UTM

Alan Turing described his construction in complete detail in his 1936 paper:

“It is possible to invent a single machine which can be used to compute any computable sequence. If this machine U is supplied with a tape on the beginning of which is written the S.D [“standard description” of an action table] of some computing machine M, then U will compute the same sequence as M.

Universal Turing Machine uses numbers in its tape as input and as output. Its commands are Left, Right and  Skip.

Symbolic Abstract Machine (SAM) compared with UTM

SAM differs from UTM, because it can use higher abstraction symbols as its input and output. The logic for each symbol is written into the symbol: the command-fact. By applying these two modifications it is possible to create a higher abstraction UTM, which is capable of simulating typical programming languages.

SymbolicAbstractMachine

We can  describe our novel construction in complete detail:

“It is possible to invent a single machine which can be used to compute any computable sequence using programming language semantics. If this machine SAM is supplied with a tape on the beginning of which is written the S.D [“standard description” of an action table] of some computing machine M, then SAM will compute the same sequence as M.”
If we thing each symbol to work like an atomic formula, only containing one reductionist formula, the construction of a symbol resembles the one of the metaphor of an atom. Hence, our SAM works using atoms, each atom containing an automaton, in the way that UTM has.

Some links:

There are two types of model checking: explicit state and symbolic model checking. We suggest symbolic analysis as an alternative for focused verification, a novel paradigm with a simulation capability.

Explicit State Model Checking

  • states are enumerated on-the-fly
  • Forwards analysis
  • Stores visited states in a hashtable
Characteristics of explicit state model checking:
  • Memory intensive
  • Good for finding concurrency errors
  • Short execution paths are better, but long execution paths can also be handled
  • Can handle dynamic creation of objects/threads
  • Mostly used in software

Symbolic Model Checking

  • Sets of states are manipulated at a time
  • Typically a backwards analysis
  • Transition relation encoded by (some variant of) Binary Decision Diagrams (BDDs) or as a satisfiability problem
Characteristics of Symbolic Model Checking:
  • Can handle very large state spaces
  • Not as good for asynchronous systems
  • Cannot deal well with long execution traces
  • Works best with a static transition relation, hence doesn’t deal well with dynamic creation of objects/threads
  • Mostly used in hardware
An example about a Kripke graph in Fig.  below (see Kripke Structure, also called as a Kripke model or a Kripke graph).
Kripke_Graph

How symbolic analysis differs from Symbolic and Explicit Model Checking?

Atomistic model and symbolic analysis, presented in this blog do not provide any complete model checking strategy. Instead, symbolic analysis provides a simulation methodology (SimulationWare), which estimates one use scenario of the code at a time. By repeating numerous scenarios it is possible for the user to evaluate alternative paths, but the purpose of it is not automatic proving, complete verification, but proving partial correctness in  co-operation with the tool and the user. The characteristics of symbolic analysis are:

  • Focused approach, not holistic
  • Symbolic execution traces. It is possible to use these outputs for verification purposes.
  • Mostly used for troubleshooting purposes
  • States are modeled using side effect elements (internal state variables)
  • Formwards analysis, but after analysis also backwards analysis is possible
  • Transition relations are code inside symbolic elements (command-field and links between elements).

Some links:

Erkki Laitila, PhD (2008) computer engineer (1977)