# A Dual-Engine for Early Analysis of Critical Systems

Aboubakr Achraf El Ghazi, Ulrich Geilmann, Mattias Ulbrich, Mana Taghdiri

Karlsruhe Institute of Technology, Germany {elghazi, geilmann, mulbrich, taghdiri}@ira.uka.de

**Abstract:** This paper presents a framework for modeling, simulating, and checking properties of critical systems based on the Alloy language – a declarative, first-order, relational logic with a built-in transitive closure operator. The paper introduces a new dual-analysis engine that is capable of providing both *counterexamples* and *proofs*. Counterexamples are found fully automatically using an SMT solver, which provides a better support for numerical expressions than the existing Alloy Analyzer. Proofs, however, cannot always be found automatically since the Alloy language is undecidable. Our engine offers an economical approach by first trying to prove properties using a fully-automatic, SMT-based analysis, and switches to an interactive theorem prover only if the first attempt fails. This paper also reports on applying our framework to Microsoft's COM standard and the mark-and-sweep garbage collection algorithm.

## 1 Introduction

Critical infrastructures such as E-Traffic, E-Energy, and Cloud employ various protocols to ensure self-organization, self-reconfiguration, load distribution, and failure recovery. Due to the size, heterogeneousness, and the highly-dynamic nature of those infrastructures, their protocols are often complex, and thus it is crucial to check their security and functionality requirements not only after they are implemented and deployed, but also at their early stages of algorithm design and refinement. This ensures that certain mistakes are caught early, and thus can be fixed at a lower cost.

Lightweight formal methods [JW96] provide a promising framework for checking critical software systems continuously in earlier stages. Alloy [Jac06], for example, provides an expressive, declarative language that can be analyzed fully automatically. The language is a combination of first-order logic and relational algebra, augmented with a built-in transitive closure operator which makes it particularly suitable for modeling structure-rich systems such as network protocols.

Alloy has been used for checking security and functionality aspects of several resource management, network communication, transportation, and security protocols, supporting the contention that lightweight formal methods are feasible and economical for critical systems. Case studies include a role-based access control security schema for protecting the access to sensitive information and resources [ZWCJ03], the intentional naming system for resource discovery in dynamic networked environments [KJ00], a pull-based asynchronous rekeying framework for scalable management of group keys in secure mul-

ticast [TJ03], the NASA's Direct-To system for helping air traffic controllers find flight plans that safely shorten the flying time [GKMV01], the New York City subway signaling system [SB01], the flash file system that caused the famous 18-day system breakdown of the NASA's mars rover *Spirit* [KJ08], a constraint analysis on Java Bytecodes to detect security vulnerabilities [Rey10], the security domain model analysis to identify illicit information flows and covert channel vulnerabilities [Sha08], and the Mondex electronic purse system for decentralized electronic money transactions [Ram07].

While all the above case studies use Alloy to check protocols at the design level, several tools have been developed that use Alloy for code-level software checking. Jalloy [Vaz04], JForge [DCJ06], and Karun [TJ07], for example, check functional properties of Java programs via translation to Alloy. TestEra [MK01], on the other hand, uses Alloy for systematic test case generation.

There are three main reasons for Alloy's popularity: (1) expressiveness of the language, (2) its fully automatic analysis engine, and (3) support for various abstraction levels. Unlike typical model checkers that only check temporal safety properties specified as finite state machines, Alloy is particularly suitable for modeling rich properties of structure-intensive systems. Such systems can be expressed in the Z specification language [Spi92] as well, but there is little tool support for automatic analysis of Z specifications. On the other hand, domain-specific tools such as AVISPA [AVI] and Scyther [Cre08] are fully automatic, but they are specially designed to check security protocols and are not suitable for checking general functionality requirements. Furthermore, Alloy's support of various abstraction levels (chosen by the user), from the algorithm design to the actual code specification, makes it possible to check the abstraction refinement properties in a uniform framework.

Despite all the successful applications of Alloy to critical systems, the Alloy engine lacks certain capabilities essential for checking critical infrastructures. The Alloy Analyzer (AA) analyzes Alloy specifications fully automatically. This analysis, however, is performed with respect to a finite *scope* – a user-provided bound on the size of the analyzed system – and thus is called *bounded verification*. For critical infrastructures, however, it is essential to have a complete proof of correctness. AA's lack of proof capability results from the fact that it translates Alloy specifications to (satisfiability-equivalent) propositional formulas, and uses a SAT solver to solve those formulas. Consequently, AA provides a poor support for integer arithmetic (handles them with respect to only a small bitwidth), which is essential in modeling smart meters, E-Energy, and E-traffic infrastructures.

Furthermore, AA's translation of Alloy to propositional logic is exponential in the scope size, causing AA to run out of memory while translating complex systems in even small scopes. Therefore, the user cannot check the system in a desirable scope by even letting AA run longer (for example overnight).

In a previous paper [GT11], we described how proof capability can be added to AA without sacrificing its full automation. However, since the Alloy logic is undecidable, it is not always possible to automatically prove properties of the systems expressed in Alloy. In this paper, we present a dual framework, capable of providing both *proofs* and *counterexamples* based on a 3-step strategy: (1) a fully automatic bounded verification based on SMT (Satisfiability Modulo Theories) that potentially improves on AA's scalability and integer support; (2) a fully automatic proof engine based on SMT and unbounded integers that can be incomplete; and (3) a complete<sup>1</sup> but interactive proof engine based on the KeY interactive theorem prover [BHS07]. The framework promises an economical approach for the use of formal methods in the context of critical infrastructures, by requiring user interaction as a last resort – only if it is really needed.

There are other approaches that implement a similar tool chain – from fully automatic to interactive proving – for other languages. The HOL-Boogie approach [BLW08], for instance, introduces a multi-phase proof engine for proof obligations emerging from the VCC compiler. The majority of obligations can automatically be discharged by an SMT solver. Only the most sophisticated problems are presented to the user for interactive proving. Another example is the Why system [BFMP11] that is used for software verification. Proof obligations can be discharged either using automatic provers or by opening them in interactive proof assistants. Our approach provides a similiar tool chain for the Alloy language, but adds another step to the chain to quickly inspect models for potential counterexamples during early design stages.

This paper first gives an overview of our analysis framework, then describes various phases of the framework using an example, and finally reports on our experimental results.

# 2 Overall Framework

To our knowledge, all previous attempts to provide proof capability for the Alloy language were based on interactive theorem provers (ITP). Dynamite [FPM07], for example, proves properties of Alloy specifications using the PVS theorem prover [OSR92] via a translation to fork algebra. Prioni [AKMR03] integrates the Alloy Analyzer with the Athena theorem prover. In these approaches, proof capability comes at the price of user interaction, regardless of the complexity of the problem. Furthermore, to our knowledge, neither approach handles integer arithmetic expressions allowed in the Alloy language.

Compared to ITP, SMT solvers can efficiently handle a rich combination of decidable theories without sacrificing completeness or full automation. Although adding first-order quantifiers to these theories makes them undecidable, recent SMT solving approaches [GdM09, BLdM09, GBT09] have shown significant advances in handling quantifiers. Our framework exploits this. In the full-verification mode, it always tries fully automatic SMT solving first, and switches to ITP only if SMT solving fails.

Since trying to prove an invalid property is particularly costly (an SMT solver may output *unknown* or time out, and an ITP may never terminate), our framework starts in the bounded-verification mode, trying to find a counterexample in a finite scope first. This allows the user to increase the scope arbitrarily in order to gain more confidence about the correctness of the property before switching to the full-verification mode. It should be noted that under certain circumstances, a minimum scope can be computed so that correctness for that scope implies already correctness for any scope [Mom05].

<sup>&</sup>lt;sup>1</sup>Modulo integer arithmetic



Figure 1: Stages of our analysis – CE: counterexample, BV: bounded valid, FV: fully valid, UK: unknown

Figure 1 gives an overview of our framework. It uses the Alloy IDE to take advantage of Alloy's facilities such as type checking and instance visualization. Technical details of the strategies are discussed in the next section.

#### **3** Approach

Our framework provides three strategies for checking a property of an Alloy specification: (1) Bounded verification checks Alloy specifications with respect to a bounded scope, aiming at finding counterexamples. Any counterexample reported by this phase is guaranteed to be valid; no false alarms are generated. Lack of a counterexample, however, does not constitute proof; it only implies that no counterexample exists within the analyzed scope. (2) SMT-based full verification aims at proving the correctness of the property fully automatically using the Z3 SMT solver [dMB08]. If Z3 outputs "unsat", the property has been proven correct, and if it outputs a counterexample preceded by the keyword "sat", a valid counterexample has been found. However, since Alloy is undecidable, Z3 does not guarantee a complete analysis: it may output a counterexample preceded by the keyword "unknown", implying that the property may or may not be valid, or time out. (3) ITPbased full verification provides a complete proof engine based on the KeY theorem prover [BHS07]. Due to our extensive set of axioms and lemmas, in some cases, the property can be proved automatically. In general, however, this analysis requires user interactions to guide the theorem prover and thus, its performance depends on the user's level of expertise.

This section describes the basics of the Alloy language and our three analysis strategies using a running example. It focuses on the main ideas involved in each analysis in order to clarify their differences. Technical details and further evaluations of the SMT-based full verification can be found elsewhere [GT11]. Details of our bounded verification and ITP-based full verification will follow in our future publications.

```
open util/ordering[Book] as ord
 1
 2
    abstract sig Target {}
 3
    sig Name extends Target {}
    sig Address extends Target {}
 4
 5
    sig Book {
 6
       names: set Name,
 7
       addr: names \rightarrow some Target
 8
    fact acyclicity { all b: Book, n: Name | not (n in n. (b.addr)) }
 9
    fun lookup [b: Book, n: Name]: set Address { n. (b.addr) & Address }
10
     pred add [b, b': Book, n: Name, t: Target] {
11
12
       (t in Address) or (t in Name and some lookup [b, t])
13
       b'.addr = b.addr + n \rightarrow t
14
     }
    pred del [b, b': Book, n: Name, t: Target] {
15
       (no b.addr.n) or (some n.(b.addr) - t)
16
17
       b'.addr = b.addr - n \rightarrow t
18
    }
19
    pred traces [] {
20
       no ord/first.addr
       all b': Book - ord/first | let b = ord/prev[b'] |
21
        some n: Name, t: Target | add[b, b', n, t] or del[b, b', n, t]
22 }
```

# Figure 2: Example – an address book specified in Alloy

## 3.1 Example

Figure 2 gives a sample model in Alloy. The model specifies the address book of an email client where names are mapped to email addresses [Jac06]. Line 1 imports the ordering module from the Alloy library to order the elements of type Book. The ordering functions will be used later by the traces predicate (Lines 19 - 22). In order to allow the use of aliases and group names for email addresses, the model declares a hierarchical type system (Lines 2 - 4) where Name and Address are subtypes of the basic type Target. Types are declared using the **sig** keyword and represent sets of elements. The **extends** keyword specifies that the subtypes are disjoint, and the **abstract** keyword denotes that any element of the supertype must belong to one of the extending subtypes.

Lines 5 - 8 declare the Book type. The field names represents all the names in the address book, and declares a binary relation of type Book  $\rightarrow$  Name. The multiplicity keyword set allows each Book to be mapped to an arbitrary number of Names. The addr field declares a ternary relation of type Book  $\rightarrow$  Name  $\rightarrow$  Target where only those elements of Name that are included in the names relation are allowed. The multiplicity keyword some denotes that for every b: Book, and every n: Name included in b.names, the pair (b, n) must be mapped to at least one Target.

A fact represents a constraint that is assumed to hold. The acyclicity fact (Line 9) constrains that no name can appear in its own set of targets directly or indirectly. The operators . and ^ represent relational join and transitive closure, respectively.

To describe the dynamic behavior of the system, the model defines additional predicates and functions. The lookup function (Line 10) returns all the addresses that correspond to a name in a particular book. The & operator denotes set intersection. Predicate add (Lines 11 - 14) specifies the addition operation: a pair (n, t) : Name × Target can be added to an address book b if t is an Address or if it is a Name that is already mapped to some Address in b (directly or indirectly). Predicate del (Lines 15 - 18) specifies the deletion operation: a pair (n, t) : Name × Target can be deleted from an address book b if no name is mapped to n or if n is also mapped to targets other than t. The operators +, -, and  $\rightarrow$  denote set union, set difference, and Cartesian product, respectively.

Predicate traces (Lines 19 - 22) specifies that the Book elements represent a sequence of add and del operations: the first book in the ordering is empty and thus contains no addresses (Line 20), and any consecutive pair of books are related by either the add or del operation (Line 21). The rest of this section describes how various properties of this address book model can be analyzed by our different analysis strategies.

#### 3.2 SMT-based Bounded Verification

Given an Alloy specification and a scope – a bound on the size of each type, we translate the Alloy specification to a satisfiability-equivalent, bounded SMT problem. Compared to Alloy Analyzer that bit-blasts Alloy problems to SAT, our translation preserves the original structure by axiomatizing Alloy constructs as first-order SMT axioms over bounded sorts – fixed-sized bitvectors. The resulting SMT problem lies within QBVF (quantified bitvector formula) logic, and thus is decidable [WHdM10].

Top-level types of the Alloy problem are translated to SMT bitvectors, according to the scope information. Since SMT-Lib – the standard SMT language – does not support sub-types, we use membership functions to enforce type hierarchies. For an Alloy subtype S, we declare a membership function  $isS : T[S] \rightarrow Bool$  where T[S] denotes the top-level supertype of  $S^2$ . This function denotes which elements of T[S] belong to S. To enforce multi-level type hierarchies, additional axioms in the form of membership implications are used.

Relations, the central concept in Alloy, are translated to boolean-valued, uninterpreted, membership functions. An Alloy relation  $R: A_1 \to \ldots \to A_n$  is represented by an SMT function  $R^{smt}: (T[A_1] \times \ldots \times T[A_n]) \to Bool$ , and constrained to hold only elements of its admissible type using the axiom

 $\forall a_1: T[A_1], \dots, a_n: T[A_n]; \ R^{smt}(a_1, \dots, a_n) \Rightarrow isA_1(a_1) \land \dots \land isA_n(a_n)$ 

Furthermore, multiplicity keywords used in relation declarations are translated using auxiliary SMT functions. For example, to represent a relation declaration  $R: A_1 \rightarrow \text{some } A_2$ , we declare an additional, uninterpreted SMT function  $one R: T[A_1] \rightarrow T[A_2]$  that non-

<sup>&</sup>lt;sup>2</sup>In certain cases, it is necessary to declare membership functions for top-level types as well.

- 1 (declare-fun isAddr (BitVec[5]) Bool)
- 2 (declare-fun isName (BitVec[5]) Bool) ;; Target is abstract
- 3 (assert (forall (t BitVec[5]) (or (isAddr t)(isName t)))) ;; Addr and Name are disjoint
- 4 (assert (forall (t BitVec[5]) (not (and (isAddr t) (isName t))))) ;; names: Book → set Name
- 5 (declare-fun names (BitVec[5] BitVec[5]) Bool)
- 6 (assert (forall (b BitVec[5]) (t BitVec[5]) (⇒ (names b t) (isName t)))) ;; addr: Book → names → some Target
- 7 (declare-fun addr (BitVec[5] BitVec[5] BitVec[5]) Bool)
- 8 (assert (forall (b BitVec[5]) (t1 BitVec[5]) (t2 BitVec[5]) (⇒ (addr b t1 t2) (Book.names b t1)))) ;; Multiplicity keyword "some"
- 9 (declare-fun oneTarget (BitVec[5] BitVec[5]) BitVec[5])
- 10 (assert (forall (b BitVec[5]) (t1 BitVec[5]) (t2 BitVec[5])
  - $(\Rightarrow (names b t1) (addr b t1 (oneTarget b t1)))))$

Figure 3: Example - bounded SMT translation of the address book declaration part

deterministically maps each element of  $T[A_1]$  to exactly one element of  $T[A_2]$ . The following axiom is then used to ensure that  $R^{smt}$  maps each element of  $A_1$  to at least one element of  $A_2$  (the semantics of **some**):

 $\forall a: T[A_1]; isA_1(a) \Rightarrow R^{smt}(a, oneR(a))$ 

Figure 3 gives the translation of the declaration part of the address book example. The translation assumes a scope of 32, and thus the top-level types, namely Book and Target, are represented by bitvectors of width 5, i.e. BitVec[5].

In addition to signatures and relations, Alloy formulas involve set and relational operators. Our translation specifies set-based semantics of these operators using first-order axioms over membership functions. Details of this axiomatization can be found elsewhere [GT11].

Using this translation, we can check the following property of the address book model.

```
assert delUndoesAddBuggy {
    all b, b', b": Book, n: Name, t: Target |
    (add[b, b', n, t] and del[b', b", n, t]) ⇒ b.addr = b".addr
}
check delUndoesAddBuggy for 32
```

The assertion states that if a pair (n, t) is first added to an address book b and then deleted afterwards, the addr relation of the final book b'' is equal to the that of the original book b. As the name suggests, this property is invalid. The Z3 SMT solver finds the following counterexample which represents the case where the initial book b already contains the pair to be added. The book b' after adding (n, t) then also contains the pair, and so deleting it results in the empty book b''. The property delUndoesAddBuggy is therefore violated since b and b'' differ.

| ( <b>define</b> n bv1[5])                                                     | Alloy types are translated to bitvectors.  |  |  |  |  |  |
|-------------------------------------------------------------------------------|--------------------------------------------|--|--|--|--|--|
| (define t bv0[5])                                                             | Hence, the counterexample defines          |  |  |  |  |  |
| ( <b>define</b> b bv1[5])                                                     | bitvector constants for all symbols in the |  |  |  |  |  |
| ( <b>define</b> b' bv16[5])                                                   | model. For instance: $n$ refers to name 1, |  |  |  |  |  |
| ( <b>define</b> b" bv0[5])                                                    | b to book 1, $b'$ to book 16, etc.         |  |  |  |  |  |
| ( <b>define</b> (addr (x1 (bv 5)) (x2 (bv 5)) (x3 (bv 5)))                    |                                            |  |  |  |  |  |
| ( <b>if</b> ( <b>and</b> (= x1 bv1[5]) (= x2 bv1[5]) (= x3 bv0[5])) <b>tr</b> | ue $\implies b.addr = \{(n,t)\}$           |  |  |  |  |  |
| ( <b>if</b> ( <b>and</b> (= x1 bv16[5]) (= x2 bv1[5]) (= x3 bv0[5])) <b>t</b> | <b>true</b> $\implies b'.addr = \{(n,t)\}$ |  |  |  |  |  |
| false)))                                                                      | $\implies b''. \texttt{addr} = \emptyset$  |  |  |  |  |  |

Z3 produces this counterexample in 1.41 seconds whereas the Alloy Analyzer requires 58.01 seconds to find a counterexample in this scope.

#### 3.3 SMT-based Full Verification

The second strategy of our framework is to prove correctness of Alloy assertions fully automatically using the Z3 SMT solver again. In contrast to the bounded analysis that produces an SMT problem in the decidable logic of quantified bitvectors, the unbounded analysis uses the AUFLIA logic <sup>3</sup> that allows quantifiers over free sorts, and thus is undecidable. Consequently, three outcomes are possible: (1) *unsat*, implying that the SMT solver has successfully proven the property correct, (2) a counterexample preceded by the keyword *sat*, implying that the SMT solver has successfully found a valid counterexample to the property being checked, and (3) a counterexample preceded by the keyword *unknown*, implying that the counterexample may or may not be valid, and must be double-checked. An invalid counterexample denotes an inconclusive analysis.

The translation rules used in this unbounded, SMT-based analysis are very similar to the ones used in the previous section, except for type declarations. While our bounded translation represents top-level types as fixed-sized bitvectors, our unbounded translation represents them as uninterpreted, free sorts in SMT. Furthermore, the unbounded translation exploits the theory of linear integer arithmetic provided by SMT solvers to translate Alloy's integers. Figure 3.3 shows the unbounded translation of the address book type hierarchy. All other axioms must be rewritten over top-level sorts. Further details can be found elsewhere [GT11].

To continue with the address book example, one can fix the delUndoesAddBuggy property based on the feedback from the previous counterexample. The corrected property delUndoesAdd, shown below, constrains the initial address book to be empty.

<sup>&</sup>lt;sup>3</sup>Closed first-order formulas over the theory of linear integer arithmetic and arrays, extended with free sort and uninterpreted function symbols.

```
1 (declare-sort Target)
2 (declare-sort Book)
3 (declare-fun isAddr (Target) Bool)
4 (declare-fun isName (Target) Bool)
Figure 4: Example – unbounded SMT translation of the address book type hierarchy
assert delUndoesAdd {
    all b, b', b'': Book, n: Name, t: Target |
        (no n.(b.addr) and add[b, b', n, t] and del[b', b'', n, t]) ⇒ b.addr = b''.addr
}
```

Our bounded analysis reports that this assertion has no counterexamples in the scope of 32. Therefore, with confidence in the correctness of the property, we use the unbounded analysis to prove the property correct. Z3 proves this property in 0.01 seconds.

#### 3.4 ITP-based Full Verification

In the interactive verification stage, the Alloy model is proven using the KeY system. For this purpose, the model is translated to KeY's typed first-order logic (FOL), which also provides support for subtyping.

Since Alloy centers around relations, we need a relational first-order theory. We therefore introduce the top-level types Relation and Tuple for relations and their elements, respectively. The uninterpreted predicate  $in : Tuple \times Relation$  connects the two types and denotes membership of a tuple in a relation.

To lower the burden of interactively proving a model correct, the translation should be as transparent as possible; the correspondence between the original model and its translation should be obvious. To achieve this, we define a FOL counterpart for each of the Alloy operators. The semantics of the usual set operations, like union and intersection, can be axiomatized using the membership predicate *in*. Defining the semantics of relational operators requires the possibility to access the components of a tuple. For this purpose, we introduce subtypes of Tuple and Relation to capture arity information:

 $Atom, Tuple2, Tuple3, \ldots <: Tuple$  $Rel1, Rel2, Rel3, \ldots <: Relation$ 

We also introduce constructor functions for tuples of any arity greater than one, for instance,

 $\begin{array}{l} binary: Atom \times Atom \rightarrow Tuple 2\\ ternary: Atom \times Atom \times Atom \rightarrow Tuple 3 \end{array}$ 

We use additional axioms to specify that (1) the image of a constructor contains all tuples of its particular arity, and (2) constructor invocations are equal iff all of their parameters are equal.

Having these notions defined, it is straightforward to define relational operators. A drawback of this translation approach is that relations of different arities have to be treated separately. For every arity, a distinct set of operators has to be defined (which we denote by subscripting the operator names with the arities they are defined for). For example, the cartesian product of two unary relations r and s is defined by

$$\begin{split} prod_{1\times 1}: Rel1 \times Rel1 \to Rel2 \\ \forall a,b: Atom; (in(binary(a,b), prod_{1\times 1}(r,s)) \Leftrightarrow in(a,r) \wedge in(b,s)) \end{split}$$

Signatures and fields of an Alloy model are represented by constant function symbols of the appropriate type. The address book example of Figure 2 declares four signatures and two relations that give rise to the following declarations:

Target, Name, Address, Book : Rel1 names : Rel2 addr : Rel3

In order for these to only capture admissible instances of the model, we restrict the constants to meet additional model constraints: (1) The signatures Name and Address are disjoint subsets of Target, (2) Target is abstract, (3) The fields names and addr are bound by their appropriate types and respect the multiplicity constraints.

Since every Alloy entity has a counterpart in FOL, translating an Alloy formula is straightforward and preserves its structure. To prove a property correct, we construct a proof obligation stating that the desired assertion follows from the model constraints.

Alloy formulas are translated using the operators from the relational first-order theory. By solely applying their definitions, we can rewrite the formulas to equivalent ones, in which only the uninterpreted symbols (i.e. the membership predicate *in* and the constructor functions) appear, but none of the operators. Although this approach might be appropriate for a purely automatic verification engine, we consider this not suitable for the task of interactive proving since it breaks any correspondence between the model and the proof. Moreover, this approach is inefficient in many cases, because formulas grow significantly in size and will contain a lot of quantifiers. We therefore define numerous inference rules that allow efficient reasoning on a higher abstraction level. These rules can be seen as lemmas and have been proven to follow from the axioms of the relational theory. Consider these representatives:

unionSubset 
$$\frac{r \subseteq s}{union_1(r,s) \rightsquigarrow s}$$
 useSubset  $\frac{in(a,r) \quad r \subseteq s}{in(a,s)}$ 

The first one is a conditional simplification rule. When the premise  $r \subseteq s$  holds, it rewrites  $union_1(r, s)$  to s. The useSubset rule has two conditions, namely in(a, r) and  $r \subseteq s$ , and infers a new formula in(a, s).

The KeY system's proof search strategy has been adjusted to automatically apply most of the lemmas that were defined. Although the strategy is usually not capable of proving functionally complex properties, our tests show that the necessary user interaction is often narrowed down to the most central steps of the proof, and most subgoals can be closed automatically.

We illustrate our interactive reasoning approach by proving the lookupYields assertion in the address book model:

```
assert lookupYields {
    traces[] ⇒ (all b: Book, n: b.names | some lookup[b, n])
}
```

The assertion states that every name known in an address book is actually mapped to some address. This assertion is valid only if all books are constructed by proper insertions and deletions. We therefore assume the traces predicate to hold. Since Z3 can not prove this assertion (it times out), we hand it to the KeY system.

The address book model linearly orders the elements of type Book by importing the ordering module. To reflect the linear ordering in the translation, we define a function b to enumerate all elements of *Book*, using KeY's built-in integer type *int*. The following axioms make b a bijection from the non-negative integers to  $Book^4$ :

$$b: int \to Atom \qquad \forall a: Atom; (in(a, Book) \Rightarrow \exists i: int; (i \ge 0 \land a \doteq b(i))) \\ \forall i: int; (i \ge 0 \Rightarrow in(b(i), Book)) \qquad \forall i, j: int; (i, j \ge 0 \Rightarrow (b(i) \doteq b(j) \Leftrightarrow i \doteq j))$$

While proving the assertion, we face two main challenges: (1) The traces predicate defines Book inductively. So we use induction on non-negative integers to prove the objective for all elements of *Book*. (2) The lookup function uses the transitive closure of a mutable relation, namely addr, that changes due to insertions and deletions. We therefore use an induction principle for transitive closure, which is defined by the following rule for an arbitrary parameterized formula  $\phi$ :

$$\begin{split} \forall a, b: Atom; (in(binary(a, b), r) \Rightarrow \phi(a, b)) \\ \forall a, b, c: Atom; (in(binary(a, b), r) \land \\ \texttt{tc\_induct} & \frac{in(binary(b, c), transClos(r)) \land \phi(b, c) \Rightarrow \phi(a, c)}{\forall a, b: Atom; (in(binary(a, b), transClos(r)) \Rightarrow \phi(a, b))} \end{split}$$

The instantiation of  $\phi$  for this rule and the induction hypothesis have to be provided manually and require a solid insight on the model. The remaining interactive steps provide technical guidance of the prover. Overall, out of a total of 6522 rule applications necessary to prove lookupYields, 122 were interactive, which included the above tc\_induct rule (2 times), induction over the elements of *Book* (2 times), quantifier instantiations (52), case distinctions (17), hiding of unnecessary formulas (25), and miscellaneous minor steps (24).

<sup>&</sup>lt;sup>4</sup>Note that this makes *Book* an infinite set. However, for this particular model, the proof remained correct when we changed the bijection from a function over all non-negative integers to one over a finite interval, thus finitizing *Book*.

|              |       | AA   |     | BOUNDED Z3 |     | UNBOUNDED Z3 |     | KEY  |     |
|--------------|-------|------|-----|------------|-----|--------------|-----|------|-----|
| PROPERTY     | SCOPE | TIME | RES | TIME       | Res | TIME         | RES | Step | RES |
| delUndoesAdd | 16    | 0.8  | CE  | 0.4        | CE  |              |     |      |     |
| -Buggy       | 32    | 58   | CE  | 1.0        | CE  | —            | NA  | -    | NA  |
| delUndoesAdd | 32    | 150  | BV  | 0.0        | BV  |              |     |      |     |
|              | 64    | ТО   | UK  | 0.0        | BV  | 0.0          | FV  | _    | NA  |
| lookupYields | 8     | 101  | BV  | 147        | BV  |              |     |      |     |
|              | 16    | ТО   | UK  | ТО         | UK  | ТО           | UK  | 122  | FV  |

Table 1: Evaluation results for the address book example – **CE:** counterexample, **BV:** bounded valid, **FV:** fully valid, **UK:** unknown, **NA:** not applicable, **TO:** time out (> 10 min.)

|             |       | AA   |     | BOUNDED Z3 |     | UNBOUNDED Z3 |     | KEY  |     |
|-------------|-------|------|-----|------------|-----|--------------|-----|------|-----|
| PROPERTY    | SCOPE | TIME | RES | TIME       | Res | TIME         | Res | TIME | RES |
| BuggyCOM    | 16    | 427  | CE  | 3.6        | CE  |              |     |      |     |
| Theorem 1   | 17    | ТО   | CE  | 1.9        | CE  | _            | NA  | -    | NA  |
| COM         | 16    | 451  | BV  | 0.0        | BV  |              |     |      |     |
| Theorem 1   | 17    | ТО   | UK  | 0.3        | BV  | 0.0          | FV  | _    | NA  |
| mark sweep  | 9     | 140  | BV  | 17         | BV  |              |     |      |     |
| Soundness 1 | 10    | ТО   | UK  | 107        | BV  | ТО           | UK  | 10   | FV  |

Table 2: Evaluation results for other case studies – **CE**: counterexample, **BV**: bounded valid, **FV**: fully valid, **UK**: unknown, **NA**: not applicable, **TO**: time out (> 10 min.)

# 4 Experiments

In this section, we first summarize our analysis of the address book example, and then report on applying our framework to Microsoft COM standard and the mark-and-sweep garbage collection algorithm.

Table 1 shows the performance of our framework on the address book example, and compares the results with the Alloy Analyzer (AA). The time (in seconds) is measured on an Intel Core2Quad, 2.8GHz, 8GB memory. The Alloy analysis time is the total of the time spent on generating CNF by AA 4.1.10 and solving it using the SAT4J solver. We used Z3 version 2.19 as the underlying SMT solver. Table 1 shows the typical progression of an integrated formal design process: First the model is analyzed using decidable, yet bounded technologies to identify errors in the specification (row 1). Having corrected the errors (row 2), the full (unbounded) verification is launched to automatically prove the properties. Since the unbounded problem is significantly harder and undecidable in general, we may have to resort to interactive verification (row 3). The steps in this row indicate the number of interactive rule applications required. To further evaluate our framework, we have checked Microsoft's Component Object Model (COM) standard [Box98], a component integration architecture that is adopted by numerous software component vendors, and provides the basis for higher-level standards such as OLE/ActiveX and COM+. We have also checked the mark-and-sweep garbage collection algorithm, which is widely used for memory management. The original Alloy specifications of these two systems are distributed with the Alloy Analyzer. However, since neither model contains an invalid assertion, we seeded a bug in the COM model (in the *Identity* axiom) for a case with a counterexample (denoted by BuggyCOM). The results of our experiments on these two systems are given in Table 2.

As shown in these tables, in most cases, the runtime of our bounded verification is significantly better than AA. This is because unlike AA that flattens all formulas for all possible values in the propositional form, our translation preserves the structure of the formulas, and thus exploits high-level simplifications offered by Z3. However, for lookupYields, our bounded verification does not perform as well as Alloy. This is because traces used in this assertion pose a challenge for our current translation. We are investigating further optimizations to simplify our axiomatization of traces in order to improve our performance.

Bounded verification of COM-Theorem1 and delUndoesAdd shows another advantage of using QBVF. These properties have no counterexamples, and Z3 can deduce that in almost zero seconds, independent of the analyzed scope. This is because the decision procedure for QBVF can potentially deduce a contradiction from the quantified formulas independently from the sizes of the bitvectors. That is, the same quantifier instantiations that produce a contradiction for smaller bitvectors can produce contradictions for larger bitvectors without any significant overhead.

So far in our experiments, our SMT-based, unbounded verification has either proved a property correct or timed out with an inconclusive (unknown) result. A third outcome is also possible where this phase finds a counterexample that was missed by the previous bounded verification phase (because the model was not analyzed in a big-enough scope). Our current experiments, however, do not expose this case. In cases where Z3 cannot verify a property (e.g. mark-and-sweep and lookupYields), the KeY interactive prover is invoked. As explained in Section 3.4, in order to prove lookupYields, the user needs to guide KeY by manually selecting 122 rule applications out of a total of 6522 rules necessary (the rest are selected automatically by KeY). In the mark-and-sweep case, however, a complete proof is found fully automatically by KeY in 10 seconds.

# 5 Conclusions

The framework presented in this paper supports modeling, checking, and proving properties of critical infrastructures expressed in the Alloy language. It offers an economical approach by introducing a new dual-analysis engine that is capable of finding counterexamples in faulty systems and proving properties of sound systems. The analysis starts with our SMT-based, bounded, fully automatic, and decidable verification technique that aims at finding counterexamples, and potentially outperforms AA. In this phase, the user can increase the scope in order to gain more confidence about the correctness of the property before switching to the full-verification mode. Full verification starts with our SMT-based, fully automatic proof engine, and switches to our interactive, ITP-based, complete verification only if the automatic proof engine fails.

## References

- [AKMR03] K. Arkoudas, S. Khurshid, D. Marinov, and M. Rinard. Integrating model checking and theorem proving for relational reasoning. *RELMICS*, pages 21–33, 2003.
- [AVI] AVISPA: Automated Validation of Internet Security Protocols and Applications. http://www.avispa-project.org/.
- [BFMP11] F. Bobot, J.-C. Filliâtre, C. Marché, and A. Paskevich. Why3: Shepherd Your Herd of Provers. In *BOOGIE*, 2011.
- [BHS07] B. Beckert, R. Hähnle, and P. H. Schmitt, editors. Verification of Object-Oriented Software: The KeY Approach. LNCS 4334. Springer-Verlag, 2007.
- [BLdM09] M. P. Bonacina, C. L., and L. de Moura. On Deciding Satisfiability by DPLL and Unsound Theorem Proving. In *CADE*, pages 35–50, 2009.
- [BLW08] S. Böhme, K. R. M. Leino, and B. Wolff. HOL-Boogie—An Interactive Prover for the Boogie Program-Verifier. In *TPHOLs*, 2008.
- [Box98] D. Box. Essential Com: The Component Object Model. Addison-Wesley Longman, Amsterdam, 1998.
- [Cre08] C. J. Cremers. The Scyther Tool: Verification, Falsification, and Analysis of Security Protocols. In CAV, pages 414–418, 2008.
- [DCJ06] G. Dennis, F. S. Chang, and D. Jackson. Modular Verification of Code with SAT. In ISSTA, pages 109–120, 2006.
- [dMB08] L. de Moura and N. Bjorner. Z3 efficient SMT solver. In TACAS, pages 337–340, 2008.
- [FPM07] M. F. Frias, C. G. L. Pombo, and M. M. Moscato. Alloy Analyzer+PVS in the analysis and verification of alloy specifications. In *TACAS*, pages 587–601, 2007.
- [GBT09] Y. Ge, C. Barrett, and C. Tinelli. Solving quantified verification conditions using satisfiability modulo theories. AMAI, 55(1):101–122, 2009.
- [GdM09] Y. Ge and L. de Moura. Complete Instantiation for Quantified Formulas in Satisfiabiliby Modulo Theories. In *CAV*, pages 306–320, 2009.
- [GKMV01] M. D. Ghassemi, V. Kuncak, D. Marinov, and M. Vaziri. Modelling NASA's Direct-To System. http://sdg.csail.mit.edu/D2/ModellingD2.htm, 2001.
- [GT11] A. A. El Ghazi and M. Taghdiri. Relational Reasoning via SMT Solving. In *FM*, pages 133–148, 2011.
- [Jac06] D. Jackson. Software Abstractions: Logic, Language and Analysis. MIT Press, 2006.

- [JW96] D. Jackson and J. Wing. Lightweight formal methods. In *School of Computer Science Carnegie Mellon University*, 1996.
- [KJ00] S. Khurshid and D. Jackson. Exploring the Design of an Intentional Naming Scheme with an Automatic Constraint Analyzer. In *ASE*, 2000.
- [KJ08] E. Kang and D. Jackson. Formal modeling and analysis of a flash filesystem in Alloy. In ABZ, pages 294–308, 2008.
- [MK01] D. Marinov and S. Khurshid. TestEra: A Novel Framework for Automated Testing of Java Programs. In ASE, 2001.
- [Mom05] L. Momtahan. Towards a Small Model Theorem for Data Independent Systems in Alloy. *Electronic Notes in Theoretical Computer Science*, 128(6):37–52, 2005.
- [OSR92] S. Owre, N. Shankar, and J. Rushby. PVS: A Prototype Verification System. In *CADE*, 1992.
- [Ram07] T. Ramananandro. Mondex, an electronic purse: specification and refinement checks with the Alloy model-finding method. *Formal Aspects of Computing*, 20(1):21–39, 2007.
- [Rey10] M. Reynolds. Lightweight Modeling of Java Virtual Machine Security Constraints. In ABZ, pages 146–159, 2010.
- [SB01] A. Sarma and A. Bhor. The NYC subway signaling system. preprint (2001) available at http://alloy.mit.edu/community/node/234, 2001.
- [Sha08] A. B. Shaffer. A Security Domain Model for Static Analysis and Verification of Software Programs. In SEKE, pages 673–678, 2008.
- [Spi92] J. M. Spivey. The Z Notation: A Reference Manual. Prentice Hall, 2 sub edition, 1992.
- [TJ03] M. Taghdiri and D. Jackson. A Lightweight Formal Analysis of a Multicast Key Management Scheme. In FORTE, pages 240–256, 2003.
- [TJ07] M. Taghdiri and D. Jackson. Inferring Specifications to Detect Errors in Code. *JASE*, 14(1):87–121, 2007.
- [Vaz04] M. Vaziri. Finding Bugs in Software with Constraint Solver. PhD thesis, MIT, 2004.
- [WHdM10] C. M. Wintersteiger, Y. Hamadi, and L. de Moura. Efficiently Solving Quantified Bit-Vector Formulas. In FMCAD, 2010.
- [ZWCJ03] J. Zao, H. Wee, J. Chu, and D. Jackson. RBAC Schema Verification Using Lightweight Formal Model and Constraint Analysis. *SACMAT*, 2003.