You are currently browsing the category archive for the ‘Model Checking’ category.

Automated theorem proving (ATP) or automated deduction, currently the most well-developed subfield of automated reasoning (AR), is the proving of mathematical theorems by a computer program.

Depending on the underlying logic, the problem of deciding the validity of a formula varies from trivial to impossible. For the frequent case of propositional logic, the problem is decidable but NP-complete, and hence only exponential-time algorithms are believed to exist for general proof tasks. For a first order predicate calculus, with no (“proper”) axioms, Gödel’s completeness theorem states that the theorems (provable statements) are exactly the logically valid well-formed formulas, so identifying valid formulas is recursively enumerable: given unbounded resources, any valid formula can eventually be proven.

However, invalid formulas (those that are not entailed by a given theory), cannot always be recognized.

A simpler, but related, problem is proof verification, where an existing proof for a theorem is certified valid. For this, it is generally required that each individual proof step can be verified by a primitive recursive function or program, and hence the problem is always decidable.

Some techniques for theorem proving

————————————

Further reading

How to make it as easily as possible?

A lean theorem prover is an automated theorem prover implemented in a minimum amount of code. Lean provers are generally[citation needed] implemented in Prolog, and make proficient use of the backtracking engine and logic variables of that language. Lean provers can be as small as a few hundred bytes of source code.

The following is a small Prolog implementation:

propexp=
not_(propexp);
 or_(propexp,propexp);
 and_(propexp,propexp);
 implies_(propexp,propexp);
 s_(symbol);
 a ; b ; c.

The beef is that this procedural definition for a theorem can be converted into an atomistic model so that each domain will be transformed into a symbolic element.

Therefore,  proving  the theorem means running the top element of the theorem. The principle is exactly the same as in source code simulation (see SimulationWare).

When those variable a, b and c will be captured from the original code, this work will enable proving source code sequences written in Java or similar languages.

Some links:

Advertisements

A Propositional directed acyclic graph (PDAG) is a data structure that is used to represent a Boolean function. A Boolean function can be represented as a rooted, directed acyclic graph of the following form:

  • Leaves are labeled with \top (true), \bot (false), or a Boolean variable.
  • Non-leaves are \mathcal{4} (logical and), \mathcal{5} (logical or) and \Diamond (logical not).
  • \mathcal{4}– and \mathcal{5}-nodes have at least one child.
  • \Diamond-nodes have exactly one child.

Leaves labeled with \top (\bot) represent the constant Boolean function which always evaluates to 1 (0). A leaf labeled with a Boolean variable x is interpreted as the assignment x = 1, i.e. it represents the Boolean function which evaluates to 1 if and only if x = 1. The Boolean function represented by a \mathcal{4}-node is the one that evaluates to 1, if and only if the Boolean function of all its children evaluate to 1. Similarly, a \mathcal{5}-node represents the Boolean function that evaluates to 1, if and only if the Boolean function of at least one child evaluates to 1. Finally, a \Diamond-node represents the complemenatary Boolean function its child, i.e. the one that evaluates to 1, if and only if the Boolean function of its child evaluates to 0.

Every binary decision diagram (BDD) and every negation normal form (NNF) is also a PDAG with some particular properties. The following pictures represent the Boolean function f(x1,x2,x3) = − x1 * − x2 * − x3 + x1 * x2 + x2 * x3. See picture below for that.


The Correspondence between PDAG and AHO (Atomistic Hybrid Object)

The semantics of PDAG resembles the one of AHO the symbolic element in the atomistic model. PDAG is an automaton of type three in the Chomsky hieararchy.

Grammar Languages Automaton Production rules (constraints)
Type-0 Recursively enumerable Turing machine \alpha \rightarrow \beta (no restrictions)
Type-1 Context-sensitive Linear-bounded non-deterministic Turing machine \alpha A\beta \rightarrow \alpha\gamma\beta
Type-2 Context-free Non-deterministic pushdown automaton A \rightarrow \gamma
Type-3 Regular Finite state automaton A \rightarrow a
and
A \rightarrow aB

However, the command of AHO is more powerful, it can express any clause adapted from source code (loopClause is any loop, pathClause is any branch, setClause is any change of any variable etc) when splitted into sub-commands. There is one characteristic feature when describing mathematical calculations or non-trivial logical equations. Each AHO should be atomistic. The command of each AHO for calculations, logical operations and transformations is opClause. It refers to an operation.

The semantics of the command covers all levels of the Chomsky hieararchy. However, simulating levels 0 and 1 needs some arrangements. See more about the limitations from the book.

Some links:

A binary decision diagram (BDD) or branching program, like a negation normal form (NNF) or a propositional directed acyclic graph (PDAG), is a data structure that is used to represent a Boolean function. On a more abstract level, BDDs can be considered as a compressed representation of sets or relations. Unlike other compressed representations, operations are performed directly on the compressed representation, i.e. without decompression.

A Boolean function can be represented as a rooted, directed, acyclic graph, which consists of decision nodes and two terminal nodes called 0-terminal and 1-terminal. Each decision node is labeled by a Boolean variable and has two child nodes called low child and high child. The edge from a node to a low (high) child represents an assignment of the variable to 0 (1). Such a BDD is called ‘ordered’ if different variables appear in the same order on all paths from the root. A BDD is said to be ‘reduced’ if the following two rules have been applied to its graph:

In popular usage, the term BDD almost always refers to Reduced Ordered Binary Decision Diagram (ROBDD in the literature, used when the ordering and reduction aspects need to be emphasized). The advantage of an ROBDD is that it is canonical (unique) for a particular functionality.[citation needed] This property makes it useful in functional equivalence checking and other operations like functional technology mapping.

Below an example for a function f with three variables:

BDDs are useful for modeling software logic. By using them it is possible to evaluate test coverage and to generate tests for the program, bottom-up.

Symbolic Analysis and BDDs

For the reductionist implementation for Symbolic Analysis, there is only one global method for AHOs. It is the run-method intended for simulating each AHO atom.

It is rather easy to write another method to generate all possible paths for each AHO, because all branches happen in a path Clause (see book Symbolic Analysis). Lets our new method be bdd.  We have a program P with the contents X1 • X2  • X3. After making an atomistic model, we have a model M, where the start element M0 is a model element with the contents meta(set([X1, X2, X3])). When we call M:bdd, then it call successively X1:bdd, X2:bdd and X3:bdd. Each of them returns a subtree as the clause formalism. In the main level the whole tree will be collected.

Summary: BDD can be seen as a small easy subset of simulation. BDD can be either static or partially dynamic, using method invokations as far as possible.

Some links:

The SLAM project, which was started by Microsoft Research, aimed at verifying some software safety properties using model checking techniques. It is implemented in Ocaml, and has been used to find many bugs in Windows Device Drivers. It is distributed as part of the Microsoft Windows Driver Foundation development kit as the Static Driver Verifier (SDV).

SLAM uses a technique called counterexample-guided abstraction refinement, which uses progressively better models of the program under test.

“SLAM originally was an acronym but we found it too cumbersome to explain. We now prefer to think of ‘slamming’ the bugs in a program.” It probably stood for “Software, Languages, Analysis, and Modeling.” Note that Microsoft has since re-used SLAM to stand for “Social Location Annotation Mobile”

About Abstraction Methods:

There are three ways to abstract programs.

  1. Over-abstract like abstract interpretation
  2. Under-abstraction like SLAM, also called meat-axe – abstraction, which cuts of some information
  3. Precise abstraction which is the best description for Symbolic Analysis (Laitila), which uses the Symbolic language for that.

Because SLAM removes some information, it is not complete to describe program comprehension features of the program, which include: control flow, data flow, state flow, operations and full functionality (Pennington, Wiedenbeck, von Mayrhauser et al.).

Paper X describes counterexample-guided abstraction refinement with some methods:

There is a traffic light example in it to model states.

Some algorithms in the paper are SPLITPath, PolyRefine and SPLITLoop.

Precice abstraction in Symbolic Analysis

We define precise abstraction as a formula M <– –> M’, where M is the original model (or code) containing Java, C++ or other language and M’ is the abstracted model with AHO objects, where the command from each AHO has been transformed according to the original grammar term of the corresponding program element. The abstracted model can be transformed back into the original language. Thus, the model is reversible.

Some links:

Java Pathfinder is a system to verify executable Java bytecode programs. In its basic form, it is a Java Virtual Machine (JVM) that is used as an explicit state software model checker developed by NASA.

An Example about Path Finding

Below a small example for detecting a possible bug (assert) in a control flow:

int x, y;

if (x > y) {
x = x + y;
y = x – y;
x = x – y;
if (x > y)
assert false;
}
Below its logic is drawn:

Java Path Finder is able to follow on paths and to evaluate expressions using symbolic evaluation.

JPF is rather flexible

  • Symbolic execution
  • Can start at any point in the program
  • Can use mixed symbolic and concrete inputs
  • No special test driver needed – sufficient to have an executable program that uses the method/code under test

  • Any time symbolic execution
  • Use specialized listener to monitor concrete execution and trigger symbolic execution based on certain conditions
  • Unit level analysis in realistic contexts
  • Use concrete system-level execution to set-up environment for unit-level symbolic analysis
  • Applications:
  • Exercise deep system executions
  • Extend/modify existing tests: e.g. test sequence generation for Java containers

Inside JPF

JPF is an extension to  JVM. JPF Listener is able to listen code execution.

Symbolic Analysis (SAM) vs. JPF

SAM has not initially been planned for evaluating paths, because its main function is deterministic. However, its symbolic model allows traversing multiple paths at the same time. In that nondeterm mode, it should write its intermediate results to a tape tree, which is a modification from a standard trace (see Turing machine: tape).

Starting simulation via SAM is as flexible as JPF.

Other tools

MoonWalker is a model checker developed on the Mono platform. MoonWalker is a program to automatically detect errors in CIL bytecode programs, i.e. applications written for the .NET platform.

The current version of MoonWalker is able to find deadlocks and assertion violations in CIL programs, generated with Mono’s C# compiler.

Some links:

Symbolic Analysis Laboratory (SAL)

SAL is a project established by Berkeley and Standford Universities used for automated formal methods.

Motivation for SAL is the following (the link):

To become practical for assurance, automated formal methods must be made more scalable, automatic, and cost-effective. Such an increase in scope, scale, automation, and utility can be derived from an emphasis on a systematic separation of concerns during verification. SAL (Symbolic Analysis Laboratory) attempts to address these issues. It is a framework for combining different tools to calculate properties of concurrent systems.

SAL defines symbolis analysis as follows:

Symbolic analysis is simply the computation of fixed point properties of programs through a combination of deductive and explorative techniques.
SAL sees the key elements of symbolic analysis as

  1. Automated deduction, in computing property preserving abstractio9ns and propagating the consequences of known properties.
  2. Model checking, as a means of computing global properties of by means od systematic symbolic exploration. For this purpose, model checking is used for actually computing fixed points such as the reachable state set, in addition to verifying given temporal properties.
  3. Invariant generation, as a technique for computing useful properties and propagating known properties.
  4. SAL 3.0  is the current the tool suite for SAL SAL tutorial can be downloaded from here: http://sal.csl.sri.com/doc/salenv_tutorial.pdf

Principles of SAL 3.0

The “motor” of SAL is its language.

The SAL language is not that different from the input languages used by various other verification tools such as SMV, Murphi, Mocha, and SPIN. Like these languages, SAL describes transition systems in terms of initialization and transition commands. These can be given by variable-wise definitions in the style of SMV or as guarded commands in the style of Murphi.

A complete description of the SAL language is available online at http://sal.csl.sri.com/.

Simple example using SAL:
short: CONTEXT = 1
BEGIN
  State: TYPE = {ready, busy};
  main: MODULE =
  BEGIN
    INPUT request : BOOLEAN
    OUTPUT state : State
    INITIALIZATION
    state = ready
    TRANSITION
    state’ IN IF (state = ready) AND request THEN
      {busy}
     ELSE
      {ready, busy}
     ENDIF
  END;
END
Starting the program happens as follows:   sal > (start-simulation! “main”)

Some observations considering reverse engineering and program comprehension (Laitila’s approach):
  1. SALenv contains a symbolic model checker called sal-smc.  It allows users to specify properties in linear temporal logic (LTL), and computation tree logic (CTL). One typical command for sal-smc is:  THEOREM main |- AG(request => AF(state = busy));
  2. Some other featrures are Peterson protocol, Bakery protocol and Syncronous bus arbiter etc. Some examples.

Some comments:

  1. That kind of language specific approach leads to a procedural paradigm via a program which indirectly describes the model.  In that approach the model to be verified is data, which tries to model the code.
  2. The implementation language SAL 3.0 is very far from the programming languages. Therefore, SAL is not useful for reverse engineering or for program comprehension.

Some links:

A Kripke structure is a type of nondeterministic finite state machine used in model checking to represent the behaviour of a system. It is basically a graph whose nodes represent the reachable states of the system and whose edges represent state transitions. A labeling function maps each node to a set of properties that hold in the corresponding state. Temporal logics are traditionally interpreted in terms of Kripke structures.

Let AP be a set of atomic propositions, i.e. boolean expressions over variables, constants and predicate symbols.

A Kripke structure is a 4-tuple M = (S,\; I,\; R,\; L) consisting of

  • a finite set of states S\;
  • a set of initial states I \subseteq S
  • a transition relation R \subseteq S \! \times \! S \; where \; \forall s \! \in \! S, \; \exist s^' \!\! \in \! S such that  (s,s^') \in R
  • a labeling (or interpretation) function L: S \rightarrow 2^{AP}

Since R is left-total, it is always possible to construct an infinite path through the Kripke structure. Thus a deadlock state has a single outgoing edge back to itself.

The labeling function L defines for each state sS the set L(s) of all atomic propositions that are valid in s. In the Kripke structure, the transition function should be complete,i.e each and every state should have transition.In other words there should not be any dead states in the graph.

Combining the Semantics of the Atomistic Model and the Kripke Structure with each other

If we use the notation of the Kripke structure for the Atomistic Model we get the definitions:

The atomistic model is a Kripke structure,  a 4-tuple M = (S,\; I,\; R,\; L) consisting of

  • a finite set of  states S\; implemented as symbols in the model illustrating clauses of the original clause and side effects caused by the simulation/evaluation processes.
  • a set of initial states I \subseteq S implemented as symbols of the models: either clauses or side effects.
  • a transition relation R \subseteq S \! \times \! S \; where \; \forall s \! \in \! S, \; \exist s^' \!\! \in \! S such that  (s,s^') \in R implemented as links of the symbols.
  • a labeling (or interpretation) function L: S \rightarrow 2^{AP} implemented as a command-fact in each symbol.

Summary about the correspondece

The atomistic model is downwards compatible with the Kripke structure. It leads to a conculsion (by inspection) that all functions and mathematical processes intended for the Kripke structure, e.g. model checking, can be borrowed for the use of the atomistic model. Only a simple transformation is needed.

Some links:

Erkki Laitila, PhD (2008) computer engineer (1977)