Verification of a Cryptographic Primitive: SHA-256

Verification of a Cryptographic Primitive: SHA-256 ANDREW W. APPEL, Princeton University A full formal machine-checked verification of a C program: t...
Author: Hortense Poole
0 downloads 0 Views 386KB Size
Verification of a Cryptographic Primitive: SHA-256 ANDREW W. APPEL, Princeton University

A full formal machine-checked verification of a C program: the OpenSSL implementation of SHA-256. This is an interactive proof of functional correctness in the Coq proof assistant, using the Verifiable C program logic. Verifiable C is a separation logic for the C language, proved sound w.r.t. the operational semantics for C, connected to the CompCert verified optimizing C compiler. Categories and Subject Descriptors: D.2.4 [Software/Program Verification]: Correctness proofs; E.3 [Data Encryption]: Standards; F.3.1 [Specifying and Verifying and Reasoning about Programs] General Terms: Verification

1. INTRODUCTION

[C]ryptography is hard to do right, and the only way to know if something was done right is to be able to examine it. . . . This argues very strongly for open source cryptographic algorithms. . . . [But] simply publishing the code does not automatically mean that people will examine it for security flaws. Bruce Schneier [1999] Be suspicious of commercial encryption software . . . [because of] back doors. . . . Try to use public-domain encryption that has to be compatible with other implementations. . . . ” Bruce Schneier [2013] That is, use widely used, well examined open-source implementations of published, nonproprietary, widely used, well examined, standard algorithms—because “many eyes make all bugs shallow” works only if there are many eyes paying attention. To this I add: use implementations that are formally verified with machine-checked proofs of functional correctness, of side-channel resistance, of information-flow properties. “Many eyes” are a fine thing, but sometimes it takes them a couple of years to notice the bugs [Bever 2014]. Verification can guarantee program properties in advance of widespread release. In this paper I present a first step: a formal verification of the functional correctness of the SHA-256 implementation from the OpenSSL open-source distribution. Formal verification is not necessarily a substitute for many-eyes assurance. For example, in this case, I present only the assurance of functional correctness (and its corollary, safety, including absence of buffer overruns). With respect to other properties such as timing side channels, I prove nothing; so it is comforting that this same C program has over a decade of widespread use and examination. SHA-256, the Secure Hash Algorithm with 256-bit digests, is not an encryption algorithm, but it is used in encryption protocols. The methods I discuss in this paper can be applied to the same issues that appear in ciphers such as AES: interpretation of standards documents, big-endian protocols implemented on little-endian machines, odd corners of the C semantics, storing bytes and loading words, signed and unsigned arithmetic, extended precision arithmetic, trustworthiness of C compilers, use of machine-dependent special instructions to make things faster, correspondence of models to programs, assessing the trusted base of the verification tools. c Copyright Andrew W. Appel. This material is based on research sponsored by the DARPA under agreement number FA8750-12-2-0293. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government. ACM Transactions on Programming Languages and Systems, to appear 2015

2

Andrew W. Appel

This paper presents the following result: I have proved functional correctness of the OpenSSL implementation of SHA-256, with respect to a functional specification: a formalization of the FIPS 180-4 Secure Hash Standard [FIPS 2012]. The machinechecked proof is done using the Verifiable C program logic, in the Coq proof assistant. Verifiable C is proved sound with respect to the operational semantics of C, with a machine-checked proof in Coq. The C program can be compiled to x86 assembly language with the CompCert verified optimizing C compiler; that compiler is proved correct (in Coq) with respect to the same operational semantics of C and the semantics of x86 assembly language. Thus, by composition of machine-checked proofs with no gaps, the assembly-language program correctly implements the functional specification. In addition, I implemented SHA-256 as a functional program in Coq and proved it equivalent to the functional specification. Coq can execute the functional program on real strings (only a million times slower than the C program), and gets the same answer as standard reference implementations.1 This gives some extra confidence that no silly things are wrong with the functional spec. Limitations. The implementation is from OpenSSL, with some macro expansion to instantiate it from generic SHA-2 to SHA-256. I factor assignment statements so that there is at most one memory operand per command, e.g., ctx→ h[0] += a; becomes t=ctx→ h[0]; ctx→ h[0]=t+a; see §10. CompCert generates assembly language, not machine language; there is no correctness proof of the assembler or of the x86 processor hardware on which one might run the compiled program. The Coq proof assistant is widely used, and its kernel is believed to be a correct implementation of the Predicative Calculus of Inductive Constructions (CiC), which in turn is believed to be consistent. A different kind of limitation is in the time and cost of doing the verification. SHA256 was the “shakedown cruise” for the Verifiable C system. This “cruise” revealed many inefficiencies of Verifiable C’s proof automation system: it is slow, it is a memory hog and it is difficult to use in places, and it is incomplete: some corners of the C language have inadequate automation support. But this incompleteness, or shakiness of proof automation, cannot compromise the end-to-end guarantee of machine-checked logical correctness: every proof step is checked by the Coq kernel. Nonlimitations. The other way that my implementation differs from OpenSSL is that I used the x86’s byte-swap instruction in connection with big-endian 4-byte load/store (since it is a little-endian machine). This illustrates a common practice when implementing cryptographic primitives on general-purpose microprocessors: use machine-dependent special instructions to gain performance. It is good that the program logic can reason about such instructions. What about using gcc or LLVM to compile SHA-256? Fortunately, these compilers (gcc, LLVM, CompCert) agree quite well on the C semantics, so a verification of SHA256 can still add assurance for users of other C compilers. In most of the rare places they disagree, CompCert is correct and the others are exhibiting a bug [Yang et al. 2012]; no bugs have ever been found in the phases of CompCert behind Verifiable C.2 1 That’s

0.25 seconds per block, versus 0.25 microseconds; fast enough for testing the specification. The Coq functional program is a million times slower because it simulates the logical theory of binary integers used in the specification! The functional spec is even slower than that, because its W function takes a factor of 416 more time. 2 That is, CompCert has a front-end phase from C to C light; Verifiable C plugs in after this phase, at C light. Yang et al. [2012] found a bug or two in that front-end phase at a time when that phase was not formally verified, but they could not find any bugs in any of the verified phases, the ones between C light and assembly language. Since then, Leroy has formally verified the C-to-Clight phase, but that doesn’t matter for Verifiable C, because in effect we verify functional correctness of the C light program. Also, Yang

ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

3

2. VERIFIED SOFTWARE TOOLCHAIN

The Verified Software Toolchain (VST) [Appel et al. 2014] contains the Verifiable C program logic for the C language, proved sound with respect to the operational semantics of CompCert C. The VST has proof automation tools for applying this program logic to C programs. One style of formal verification of software proceeds by applying a program logic to a program. An example of a program logic is Hoare logic, which relates the program c to its specification (precondition P , postcondition Q) via the judgment {P }c{Q}. This Hoare triple can be proved by using the inference rules of the Hoare logic, such as the sequential composition rule: {P } c1 {Q} {Q} c2 {R} {P } c1 ; c2 {R} We prefer sound program logics or analysis algorithms, i.e., where there is a proof that whatever the program logic claims about your program is actually true when the program executes. The VST is proved sound by giving a semantic model of the Hoare judgment with respect to the full operational semantics of CompCert C, so that we can really say, what you prove in Verifiable C is what you get when the source program executes. CompCert is itself proved correct, so we can say, what you get when the source program executes is the same when the compiled program executes. Composing these three proofs together: the proof of a program, the soundness of Verifiable C, and the correctness of CompCert, we get: the compiled program satisfies its specification. C programs are tricky to verify because one needs to keep track of many side conditions and restrictions: this variable is initialized here, that addition does not overflow, this p < q compares pointers into the same object, that pointer is not dangling. The Verifiable C logic keeps track of every one of these; or rather, assists the user in keeping track of every one. We know that there are no missed assumptions, because of the soundness proof w.r.t. the C semantics; and we know the C semantics does not miss any, because of the CompCert correctness proof w.r.t. safe executability in assembly language. Of course, there are easier ways to prove programs correct. One can write functional programs in languages (such as Gallina, ML, Haskell) with much cleaner proof theories than C, and then the proof effort is smaller by an order of magnitude. Whenever the performance of a high-level garbage-collected language is tolerable, this is the way to go. The vast amount of software that is today written in Perl, Python, Javascript might profitably be rewritten in functional languages with clean proof theories for effective verification. But cryptographic primitives are not written in these languages; if we want to verify a well established widely used open-source cryptographic implementation, we need tooling for C. Synthesis instead of verification?. C was not designed with a simple proof theory in mind, so perhaps a simpler route to verified crypto would be to use program synthesis from a domain-specific specification language. One example is Cryptol [Erkok et al. 2009] which can generate either C or VHDL directly from a functional specification. In principle one could hope to prove the Cryptol synthesizer correct (though this has not been done) or validate the output (which might be easier than proving general-purpose C programs). et al. [2012] found a specification bug in CompCert, regarding how it treated bit-fields. Although Leroy has since fixed that specification bug, this also does not matter: Verifiable C is immune to specification bugs in C, for reasons discussed in §8.

ACM Transactions on Programming Languages and Systems, to appear 2015

4

Andrew W. Appel

Unfortunately, synthesis languages sometimes have limited expressiveness. Cryptol has been used to synthesize the block-shuffle part of SHA from a functional spec— but only the block-shuffle (the function that OpenSSL calls sha256 block data order).3 Using Verifiable C I have verified the entire implementation, including the padding, length computation, multi-block handling, incremental update of unaligned strings, and so on. The Cryptol synthesizers (which translate Cryptol to C or VHDL) can handle only fixed-size blocks, so cannot handle these parts (SHA256 Init, SHA256 Update, SHA256 Final, SHA256). 3. SPECIFICATION OF SHA-256

A program without a specification cannot be incorrect, it can only be surprising.4 Typically one might prove a C program correct with respect to a relational specification. For example, a C implementation implementing lookup tables must satisfy this relation between program states and inputs/outputs: that if the most recent binding for x is x 7→ y, then looking up x yields y. Sometimes one does this in two stages: prove that the C program correctly implements a functional specification (an abstraction of the implementation), then prove that functional specification satisfies the relational specification. For example, a C implementation implementing lookup tables by balanced binary search trees might be proved correct with respect to a functional-spec of red-black trees. Then the functional red-black trees can (more easily) be proved to have the lookup-table property. For cryptographic hashing, we5 built a functional spec from the FIPS 180-4 standard [FIPS 2012]. The relational spec is, “implements a random function.” Unfortunately, nobody in the world knows how to prove that SHA-256 implements a random function—even on paper—so I did not attempt a machine-checked proof of that (see §11). The FIPS 180-4 SHS (Secure Hash Standard) mentions (in §3.2) 32-bit unsigned binary arithmetic with operators such as addition-modulo-32, exclusive-or, and shifting. We must give a model of 32-bit arithmetic in pure logic. Fortunately, Leroy has defined such an Integers module and proved many of its properties as part of the semantics of CompCert C [Leroy 2009]; we use this directly in the functional spec, which is otherwise entirely independent of the C language. We have: the type int; operations such as Int.add, Int.xor; injection (Int.repr : Z → int) from the mathematical integers to 32-bit integers, and projection (Int.unsigned : int → Z). We have “axioms” such as, 0 ≤ i < 232 Int.unsigned (Int.repr i ) = i but this is not an axiom of the underlying logic (CiC), it is a theorem proved6 from the axioms of Coq using the constructive definitions of Int.unsigned and Int.repr. SHS defines SHA-256 on a bitstring of any length, and explains how to pack these into big-endian 32-bit integers. OpenSSL’s implementation permits any sequence of bytes, that is, multiples of 8 bits. We represent a sequence of byte values by a sequence of mathematical integers, and we can define the big-endian packing function as, Definition Z-to-Int (a b c d : Z) : Int.int := Int.or (Int.or (Int.or (Int.shl (Int.repr a) (Int.repr 24)) (Int.shl (Int.repr b) (Int.repr 16))) 3 Aaron

Tomb, Galois.com, personal communication, 13 January 2014. of J. J. Horning, 1982. 5 Stephen Yi-Hsien Lin wrote a functional spec of SHA-256 in Coq, which I subsequently adapted and rewrote. 6 Henceforth, “proved” can be understood to mean, “proved with a machine-checked proof in Coq.” 4 Paraphrase

ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

5

(Int.shl (Int.repr c) (Int.repr 8))) (Int.repr d). Given a list nl of byte values (represented as mathematical integers, type Z), if the length of nl is a multiple of 4, it’s simple to define the corresponding list of big-endian 32-bit integers: Fixpoint Zlist-to-intlist(nl: list Z): list int := match nl with h1::h2::h3::h4::t ⇒ Z-to-Int h1 h2 h3 h4 :: Zlist-to-intlist t | -⇒ nil end. Coq uses Definition for a nonrecursive function (or value, or type), and Fixpoint for structurally recursive functions. The operator :: is list cons. The operations such as Int.shl (shift left) and Int.repr (the 32-bit representation of a mathematical integer) are given foundational meaning by Leroy’s Int package, as explained above. {256} {256} {256} {256} SHS defines several functions Ch, Maj, ROTR, SHR, Σ0 , Σ1 σ0 , σ1 , for example, Ch(x , y, z ) = (x ∧ y) ⊕ (¬x ∧ z ) Maj(x , y, z ) = (x ∧ y) ⊕ (x ∧ z ) ⊕ (y ∧ z ) Translating these into Coq is quite straightforward: Definition Ch (x y z : int) : int := Int.xor (Int.and x y) (Int.and (Int.not x) z). Definition Maj (x y z : int) : int := Int.xor (Int.xor(Int.and x z)(Int.and y z))(Int.and x y). Definition Rotr b x : int := Int.ror x (Int.repr b). Definition Shr b x : int := Int.shru x (Int.repr b). Definition Sigma-0 (x : int) : int := Int.xor (Int.xor (Rotr 2 x) (Rotr 13 x)) (Rotr 22 x). Definition Sigma-1 (x : int) : int := ... Definition sigma-0 (x : int) : int := ... Definition sigma-1 (x : int) : int := ... {256}

{256}

The vector K0 . . . K63 is given as a series of 32-bit hexadecimal constants, as (0) (0) is the vector H0 . . . H7 . In Coq we write them in decimal, and inject with Int.repr: Definition K := map Int.repr [1116352408 , 1899447441, 3049323471, ..., 3329325298]. Definition initial -registers := Map Int.repr [1779033703, 3144134277, ..., 1541459225]. Given a message M of length ` bits, the SHS explains: append a 1 bit, then enough zero bits so the length-appended message will be a multiple of the block size, then a 64-bit representation of the length. Since we have a message M of length n bytes; we append a 128 byte (which already has 7 trailing zeros), then the appropraiate number of zero bytes. We big-endian convert this to 32-bit integers, then add two more 32-bit integers representing the high-order and low-order parts of the length-in-bits. Definition generate-and-pad M := let n := Zlength M in Zlist-to-intlist (M ++ [128%Z] ++ list-repeat (Z.to-nat (-(n + 9) mod 64)) 0) ++ [Int.repr (n ∗ 8 / Int.modulus), Int.repr (n ∗ 8)]. Note that 0 ≤ a mod 64 < 64 even if a is negative. The magic number 9 comes from 1+8: 1 terminator byte (value 128) plus 8 bytes for the 64-bit length field. Taking −(n + 9) mod 64 gives the number of bytes of padding necessary to round up to the next multiple of 64 bytes, which is the block size. ACM Transactions on Programming Languages and Systems, to appear 2015

6

Andrew W. Appel

SHS defines the message schedule Wt as follows: ( (i) Mt Wt = {256} {256} σ1 (Wt−2 ) + Wt−7 + σ0 (Wt−15 ) + Wt−16

0 ≤ t ≤ 15 16 ≤ t ≤ 63

where the superscript (i ) indicates the value in the i th message block. We translate this into Coq as, Function W (M: Z → int) (t: Z) {measure Z.to-nat t} : int := if zlt t 16 then M t else (Int.add (Int.add (sigma-1 (W M (t-2))) (W M (t-7))) (Int.add (sigma-0 (W M (t-15))) (W M (t-16)))). Proof. intros; apply Z2Nat.inj-lt; omega. (∗ t-2 < t ∗) intros; apply Z2Nat.inj-lt; omega. (∗ t-7 < t ∗) intros; apply Z2Nat.inj-lt; omega. (∗ t-15 < t ∗) intros; apply Z2Nat.inj-lt; omega. (∗ t-16 < t ∗) Qed. Coq is a language of total functions. The measure and Proof/Qed demonstrate that the W function always terminates. There is one proof line for each of the 4 recursive calls; each proof is, “calling on a smaller value of t.” One could run this W as a functional program; but it takes time exponential in t, since there are 4 recursive calls. It serves well as a functional spec but it is not practically executable. The block cipher computes 256-bit (8-word) hashes of 512-bit (16-word) blocks. The (i) (i) accumulated hash of the first i blocks is the vector H0 . . . H7 . To hash the next block, the eight “working variables” a through h are initialized from the H (i) vector. Then 64 iterations of this Round function are executed: Definition registers := list int. Function Round (regs: registers) (M: Z → int) (t: Z) {measure (fun t ⇒ Z.to-nat(t+1)) t} : registers := if zlt t 0 then regs else match Round regs M (t-1) with | [a,b,c,d,e,f,g,h] ⇒ let T1 := Int.add(Int.add(Int.add(Int.add h (Sigma-1 e))(Ch e f g))(nthi K t))(W M t) in let T2 := Int.add (Sigma-0 a) (Maj a b c) in [Int.add T1 T2, a, b, c, Int.add d T1, e, f, g] | -⇒ nil end. Proof. intros; apply Z2Nat.inj-lt; omega. Qed. That is, one calls (Round r (nthi block) 63). (The function nthi b i returns the i th element of the list b, or the arbitrary element Int.zero if i is negative or beyond the length of the list.) If the length of the regs list is not 8, an arbitrary result (the empty list) is returned; but it will be 8. I represent registers as a list, rather than a dependently typed vector (i.e., a list whose type inherently enforces the length restriction) to keep the specification as firstorder as possible. This simplifies reasoning about the specification, especially its portability to other logics. ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

7

The round function returns registers a 0 , b 0 , . . . , h 0 which are then added to the H (i) to yield H (i+1) : Definition hash-block (r: registers) (block: list int) : registers := map2 Int.add r (Round r (nthi block) 63). Given a message of length 16k , the following function computes the H (k ) by applying hash-block to each successive 16-word block: Function hash-blocks (r: registers) (msg: list int) {measure length msg} : registers := match msg with | nil ⇒ r | -⇒ hash-blocks (hash-block r (firstn 16 msg)) (skipn 16 msg) end. Proof. ... Qed. Finally, the SHA-256 produces the message digest as a 32-byte string by the bigendian conversion of H (k ) . Definition SHA-256 (str : list Z) : list Z := intlist-to-Zlist (hash-blocks init-registers (generate-and-pad str)). 4. FUNCTIONAL PROGRAM

One can prove that a program satisfies a specification, but how does one know that the specification is properly written down? One way to gain confidence in the specification is to calculate its results on a series of examples, i.e., to run it. The SHA-256 function given above is actually an executable specification. Coq permits relational (nonconstructive, propositional) specifications that do not run, but also permits fully constructive specifications such as this one. However, since the W function is exponential, it’s impractical to run this program. Therefore I wrote an alternative functional program called SHA-256’. One can run this program directly inside the Coq proof assistant on actual inputs; it takes about 0.25 seconds per block. The key to this “efficiency” is that the W function should remember its previous results.7 Here msg is the reversed list Wt−1 , Wt−2 , Wt−3 , ...W0 : Definition Wnext (msg : list int) : int := match msg with | x1::x2::x3::x4::x5::x6::x7::x8::x9::x10::x11::x12::x13::x14::x15::x16::- ⇒ (Int.add (Int.add (sigma-1 x2) x7) (Int.add (sigma-0 x15) x16)) | - ⇒ Int.zero (∗ impossible ∗) end. Should we be worried about the “impossible” case? Coq is a language of total functions, so we must return something here. One reason we need not worry is that I proved that this case cannot cause the SHA-256’ program to be wrong. That is, I proved the equivalence (using the extensionality axiom): Lemma SHA-256’-eq: SHA-256’ = SHA-256. Also, the fact that SHA-256’ gives the right answer—on all the inputs that I tried— allows us to know that SHA-256 is also right on those inputs. 7 Also

in this efficient program the generate-and-pad function is done quite differently.

ACM Transactions on Programming Languages and Systems, to appear 2015

8

Andrew W. Appel

This equivalence proof took about a day to build; this is much faster than building the proof that the C implementation correctly implements the functional spec. But sometimes we must program in C—to get SHA that runs in microseconds rather than seconds. Instead of calculating the result inside Coq, one could instead extract the program as an ML program, and compile with the OCaml compiler. This would lead to faster results than Coq, but slower than C. For the purpose of testing the SHA specification, it is unnecessary. 5. INTRODUCTION TO VERIFIABLE C

The Verifiable C language and program logic is a subset of CompCert’s C Light language. Every Verifiable C program is a legal C program, and every C program can be expressed in Verifiable C with only local program transformations, such as pulling side effects out of expressions: a=(b+=2)+3; becomes b+=2; a=b+3; (sometimes an extra local variable is required to hold the intermediate result). The CompCert compiler accepts (essentially) the full C language; we use a subset not because of any inadequacy of CompCert but to accommodate reasoning about the program. It is easier to reason about one assignment at a time. Separation logic. We use a variant of Hoare logic known as separation logic, which is more expressive regarding anti-aliasing of pointers and separation of data structures. We write {P } c {Q} to mean (more or less), if P holds before c executes, then Q will hold after. In our separation logic, the assertion P has a spatial part dealing with the contents of a particular footprint of memory, and a local part dealing with local program variables, and a propositional part dealing with mathematical variables. An example of a spatial assertion is, arrayf[0,n) (p)

(n : Z, f : Z → V, p : V)

which represents an array of n elements starting at address p, whose i th element is f (i ). Here f is a total function from (mathematical) integers to values; we ignore f ’s domain outside [0, n). Values V may be 32-bit integers (Vint i ), 32-bit representations of mathematical integers (Vint (Int.repr z )), floating point (Vfloat f ), pointers (Vptr b i ) with base b and in-the-block offset i , or undefined/uninitialized values (Vundef). Verifiable C’s array constructor actually takes two more arguments: a permission share π indicating read-only, read-write, etc.; and the C-language type of the elements, such as the type of unsigned characters, Definition tuchar := Tint I8 Unsigned noattr. Suppose we have two different arrays p, q and we execute the assignment p[i]=q[j]; one possible specification is this: {0 ≤ i < j < n ∧ (arrayf[0,n) (p) ∗ arrayg[0,n) (q))} t = q[j]; p[i] = t; f [i:=g(j )] {0 ≤ i < j < n ∧ (array[0,n) (p) ∗ arrayg[0,n) (q))} Separation logic’s inference rules prefer reasoning about one load or store at a time, so I have made a local program transformation. Here I assume there are two disjoint arrays p and q whose contents are f and g respectively. You can tell they are disjoint because the ∗ operator enforces this. Because they are disjoint, we know (in the postcondition) that q is unchanged, i.e., its contents are still g. (If the programmer had intended p and q to possibly overlap, one would write a different specification.) ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

9

Program variables, symbolic values. This is a bit of a simplification: i , j , p, q are program variables, not logical variables. Verifiable C distinguishes these; one might write the precondition “for real” as, PROP (0 ≤ i < j < n; writable-share π1 ) LOCAL(`(eq i ) (eval -id -i); `(eq j ) (eval -id -j); `(eq p) (eval -id -p); `(eq q) (eval -id -q)) SEP (`(array-at tuchar π1 f 0 n p); `(array-at tuchar π2 g 0 n q)) where the PROP part has pure logical propositions (that do not refer to program state); LOCAL gives assertions about local variables of the program state (but not memory); and SEP is the separating conjunction of spatial assertions, i.e., about various disjoint parts of memory. The notation `(eq i ) (eval -id -i) means, “C program variable -i contains the symbolic value i . This is effectively a statement about the current program state’s local-variable environment ρ. The notation `f lifts f over local-variable environments [Appel et al. 2014, Chapter 21], that is, `(eq i ) (eval -id -i) = (fun ρ ⇒ (eq i ) (eval -id -i ρ)) = (fun ρ ⇒ (fun x ⇒ i = x ) (eval -id -i ρ)) = (fun ρ ⇒ i = eval -id -i ρ). or in other words, looking up -i in ρ yields i . Permissions. The p array needs to be writable, while the q array needs to be at least read-only. This is expressed with permission-shares: p’s permission-share π1 needs to satisfy the writable-share predicate. We don’t need to say readable-share π2 because that is implied by the array-at predicate. We call these permission shares rather than just permissions because in the sharedmemory concurrent setting, a proof could split π1 = π1a + π1b into smaller shares that are given to concurrent threads. These shares π1a and π1b would not be strong enough for write permission, but they could be both strong enough for read permission. That permits exclusive-write-concurrent-read protocols. Now, suppose SHA-256 were called in one thread of a concurrent program. Its parameter (the string to be hashed) could be a read-only shared array, but its result (the array to hold the message digest) must be writable. All this is concisely expressed in the permission-share annotation of my SHA-256 specification. Control flow. The C language has control flow: a command c might fall-through normally, might continue a loop, or break a loop, or return from a function. Thus the postcondition Q must have up to four different assertions for these cases. For the case where all but fall-through are prohibited—i.e., three of these four postconditionassertions are False—use the construction normal -ret-assert. normal -ret-assert ( PROP () LOCAL(`(eq i ) (eval -id -i); `(eq j ) (eval -id -j); `(eq p) (eval -id -p); `(eq q) (eval -id -q)) SEP (`(array-at tuchar π1 (upd f i (q j )) 0 n p); `(array-at tuchar π2 g 0 n q))) This postcondition shows that the p array has changed in one spot and q has not changed. We can omit (0 ≤ i < j < n) from the postcondition, since it’s a logical fact independent of state, and (if true in the precondition) is eternally true. Higher-order reasoning. Ordinary separation logic is inexpressive regarding function-pointers, data abstraction, and concurrency; so Verifiable C is a higher-order impredicative concurrent separation logic. Higher-order means that one can quantify over predicates. This is useful for specifying abstract data types. It is also useful for function pointers: if function f takes a parameter p that’s a function-pointer, then the ACM Transactions on Programming Languages and Systems, to appear 2015

10

Andrew W. Appel

precondition of f will characterize the specification of p, i.e., p’s precondition and postcondition. When function-pointer specifications are used to describe object-oriented programs, then impredicative quantification over these specifications is necessary. One might think that C is not an object-oriented language, but in fact C programmers often use design patterns that they express with void ∗. C’s type system is to weak to “prove” that all these void ∗ casts turn out all right, but we can specify and prove this with the program logic. The SHA verification does not use these higher-order features, though one could use data abstraction for the context structure, SHA256state-st. However, OpenSSL uses an object-oriented “engine” construction to compose HMAC with SHA. Further reading. Appel et al. [2014] give a full explanation of the program logic. 6. THE C PROGRAM

The OpenSSL implementation of SHA-256 is clever in several ways (many of which were intentional in the SHA-256 design): (1) It works in one pass, waiting until the end before adding the padding and length. (2) It allows incremental hashing. Suppose the message to be hashed is available, sequentially, in segments s1 , s2 , . . . , sj . One calls SHA256 Init to initialize a context, SHA256 Update with each si in turn, then SHA256 Final to add the padding and length and hash the last block. If the si are not block-aligned, then SHA256 Update remembers partial blocks in a buffer. However, a block internal to one of the si is not cycled through the buffer; the sha256 block data order function operates on it directly from the memory where it was passed to SHA256 Update. (3) Within the 64-round computation, it store only the the most recent 16 elements of Wt , in a buffer accessed modulo 16 using bitwise-and in the array subscript. (4) In adding the length of si to the accumulated 64-bit count of bits, there is an overflow test: is the result of (a + b) mod 232 < a? If so, add a carry to the high-order word. Such tests are easy to get wrong [Wang et al. 2013]; here it works because a, b are declared unsigned, but still a proof is worthwhile. (5) In the SHA256 Final function, there is one last block containing the 1-bit, padding, and length. But there could be two “last” blocks, if the message body ends within 8 bytes of the end of a block (so there’s no room for the 1-bit plus 64-bit length). (6) The accumulated state between calls to SHA256 Update is kept in a record “owned” by the caller and initialized by SHA256 Init. But the W vector is purely local to the “round” function (sha256 block data order), so is kept as a local-variable (stackallocated) array. Although that’s not particularly clever, it’s too clever for some Clanguage verification systems, which (unlike Verifiable C) cannot handle addressable local variables [Greenaway et al. 2012; Carbonneaux et al. 2014]. The client of SHA-256 calls upon it as follows: typedef struct SHA256state-st { unsigned int h[8]; // The H vector unsigned int Nl,Nh; // Length, a 64-bit number in two parts unsigned char data[64]; // Partial block not yet hashed unsigned int num; // Length of the message fragment } SHA256-CTX; SHA256-CTX c; char digest[32]; char ∗m1 , ∗m2 , ..., ∗mk ; unsigned int n1 , n2 , ..., nk ; ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

11

// How the caller hashes a message: SHA256-Init(&c); SHA256-Update(&c, m1 , n1 ); SHA256-Update(&c, m2 , n2 ); ... SHA256-Update(&c, mk , nk ); SHA256-Final(digest, &c); The strings mi of lengths ni respectively make up the message. The idea is that Init sets up the context c with the initial register state c.h[ ], and then each Update hashes Pi some more blocks into that register state. If mi is not a full block, or rather if j =1 nj is not a multiple of the block size, then a partial block is saved in (copied into) the context c. Then the (i + 1)th call to Update will use that fragment as the beginning of the next full block. The mi need not be disjoint; the caller can build the successive parts of the message in the same m buffer. After the i th call, the registers c.h[ ] contain the hash of all the full blocks seen so far, and the length Nl,Nh contains the length (in bits) of all the message fragments, i.e., Pi 8 · j =1 nj . At the end, the Final call adds the padding and length, and hashes the last block(s). The final c.h[ ] values are then returned as a byte-string, the message digest. Most of these “clever” implementation choices are not directly visible in the functional specification, and are not representable in a domain-specific language such as Cryptol. They are just general-purpose C programming, and our specification language must be able to reason about them. 7. SPECIFYING THE C PROGRAM

The Separation Logic specification of a C program relates the program (and its inmemory data structures) to functional or relational correctness properties. Appendix B gives the full separation-logic specification of the OpenSSL SHA-256 program; here I present a part of it. The SHA256-CTX data structure has a concrete meaning and an abstract meaning. The concrete meaning is given by this 6-tuple of values, corresponding to the 6 fields of the struct: Definition s256state := (list val ∗ (val ∗ (val ∗ (list val ∗ val)))). (∗comment: h Nl Nh data num ∗) There’s a specific reason for using a tuple here, instead of a Coq record: this tuple type is calculated automatically from the C-language struct definition, inside Coq’s calculational logic. The abstract meaning is that all the full blocks of m1 + m2 + . . . + mi have been parsed8 into a sequence of 32-bit words that we call hashed; and the remaining lessthan-a-block fragment is a sequence of bytes that we call data.

8 SHS

uses the word “parsed” to indicate: grouping bytes/bits into big-endian 32-bit words, and grouping 32-bit words into 16-word blocks.

ACM Transactions on Programming Languages and Systems, to appear 2015

12

Andrew W. Appel

Inductive s256abs := (∗ SHA-256 abstract state ∗) S256abs: ∀ (hashed: list int) (∗ words hashed, so far ∗) (data: list Z), (∗ bytes in partial block ∗) s256abs. This fancy notation is really just a 2-tuple (hashed,data); I define it this way to influence the names Coq chooses for introduced variables. The abstract state is an abstraction of the concrete state. I make this relation formal in Coq as follows. First, we calculate what the H vector would be at the end of hashed: Definition s256a-regs (a: s256abs) : list int := match a with S256abs hashed data ⇒ hash-blocks init-registers hashed end. Notice that this calls upon hash-blocks from the functional spec described in section 3. Next, we can calculate the bit-length of the hashed words plus the data bytes: Definition s256a-len (a: s256abs) : Z := match a with S256abs hashed data ⇒ (Zlength hashed ∗ 4 + Zlength data) ∗ 8 end. We can define the 64-bit concatenation of two 32-bit numbers, and what it means for a (mathematical) integer to be representable in an unsigned char: Definition hilo (hi: int) (lo:int) : Z := (Int.unsigned hi ∗ Int.modulus + Int.unsigned lo). Definition isbyteZ (i: Z) := (0 ≤ i < 256). Finally, here is the abstraction relation: Definition s256-relate (a: s256abs) (r: s256state) : Prop := match a with S256abs hashed data ⇒ s256-h r = map Vint (hash-blocks init-registers hashed) ∧ (∃ hi, ∃ lo, s256-Nh r = Vint hi ∧ s256-Nl r = Vint lo ∧ (Zlength hashed ∗ 4 + Zlength data)∗8 = hilo hi lo) ∧ s256-data r = map Vint (map Int.repr data) ∧ (length data < 64 ∧ Forall isbyteZ data) ∧ (16 | Zlength hashed) ∧ s256-num r = Vint (Int.repr (Zlength data)) end. That is, a concrete state (rh , rNh , rNl , rdata , rnum ) represents an abstract state a = (hashed , data) whenever: — rh is the result of hashing all of hashed ; — the bit-length of (hashed , data) equals rNh · 232 + rNl ; — the sequence of char values rdata corresponds exactly to the sequence of (mathematical) integers data; — the length of data is less than the block size, and every element of data is 0 ≤ d < 256; — the length of hashed is a multiple of 16 words; — the length of data is rnum bytes. Verifiable C’s logic has an operator (data-at π τ r p) saying that memory-address p, interpreted according to the C-language type τ , contains (struct/array/integer) data value r with access permission π. For example: τ =t-struct-SHA256state-st, p is a pointer to struct SHA256state-st, r is a concrete-state value (rh , rNh , rNl , rdata , rnum ), and π is the full-access permission Tsh. To relate the in-memory SHA256-CTX to an abstract state, we simply compose the relations data-at and s256-relate: ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

13

Definition sha256state- (a: s256abs) (c: val) : mpred := EX r : s256state, PROP (s256-relate a r ) LOCAL () SEP (data-at Tsh t-struct-SHA256state-st r c). This relates a to c by saying there exists a concrete state r such that abstract-toconcrete composes with concrete-in-memory. Incremental update. The SHA256-Update function updates a context c with the bytes data- of length len: void SHA256_Update (SHA256_CTX *c, const void *data_, size_t len); Suppose data- contains the sequence of integers msg. Appending msg to an abstract state a = (hashed , oldfrag) yields the updated abstract state a 0 = (hashed ++blocks, newfrag) when, Inductive update-abs: list Z → s256abs → s256abs → Prop := Update-abs: (∀ msg hashed blocks oldfrag newfrag, Zlength oldfrag < 64 → Zlength newfrag < 64 → (16 | Zlength hashed ) → (16 | Zlength blocks) → oldfrag++msg = intlist-to-Zlist blocks ++ newfrag → update-abs msg (S256abs hashed oldfrag) (S256abs (hashed ++blocks) newfrag)). where intlist-to-Zlist unpacks big-endian 32-bit words into a sequence of byte values. With these preliminaries defined, I can now present the separation-logic specification of the Update function. Definition SHA256-Update-spec := DECLARE -SHA256-Update WITH a: s256abs, data: list Z, c : val, d : val, sh: share, len : nat PRE [ -c OF tptr t-struct-SHA256state-st, -data- OF tptr tvoid, -len OF tuint ] PROP (len >3)) #define sigma1(x) (ROTATE((x),15) ^ ROTATE((x),13) ^ ((x)>>10)) #define Ch(x,y,z) #define Maj(x,y,z)

(((x) & (y)) ^ ((~(x)) & (z))) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))

void sha256_block_data_order (SHA256_CTX *ctx, const void *in) { unsigned MD32_REG_T a,b,c,d,e,f,g,h, s0,s1,T1,T2,t; SHA_LONG X[16],l,Ki;

ACM Transactions on Programming Languages and Systems, to appear 2015

26

Andrew W. Appel

int i; const unsigned char *data=in; a = ctx->h[0]; e = ctx->h[4];

b = ctx->h[1]; f = ctx->h[5];

c = ctx->h[2]; g = ctx->h[6];

d = ctx->h[3]; h = ctx->h[7];

for (i=0;ih[1]; t=ctx->h[2]; t=ctx->h[3]; t=ctx->h[4]; t=ctx->h[5]; t=ctx->h[6]; t=ctx->h[7]; return;

ctx->h[0]=t+a; ctx->h[1]=t+b; ctx->h[2]=t+c; ctx->h[3]=t+d; ctx->h[4]=t+e; ctx->h[5]=t+f; ctx->h[6]=t+g; ctx->h[7]=t+h;

} void SHA256_Init (SHA256_CTX *c) { c->h[0]=0x6a09e667UL; c->h[1]=0xbb67ae85UL; c->h[2]=0x3c6ef372UL; c->h[3]=0xa54ff53aUL; c->h[4]=0x510e527fUL; c->h[5]=0x9b05688cUL; c->h[6]=0x1f83d9abUL; c->h[7]=0x5be0cd19UL; c->Nl=0; c->Nh=0; c->num=0; return; } void SHA256_addlength(SHA256_CTX *c, size_t len) { SHA_LONG l, cNl,cNh; cNl=c->Nl; cNh=c->Nh; l=(cNl+(((SHA_LONG)len)29);

ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

27

c->Nl=l; c->Nh=cNh; return; } void SHA256_Update (SHA256_CTX *c, const void *data_, size_t len) { const unsigned char *data=data_; unsigned char *p; size_t n, fragment; SHA256_addlength(c, len); n = c->num; p=c->data; if (n != 0) { fragment = SHA_CBLOCK-n; if (len >= fragment) { memcpy (p+n,data,fragment); sha256_block_data_order (c,p); data += fragment; len -= fragment; memset (p,0,SHA_CBLOCK); /* keep it zeroed */ } else { memcpy (p+n,data,len); c->num = n+(unsigned int)len; return; } } while (len >= SHA_CBLOCK) { sha256_block_data_order (c,data); data += SHA_CBLOCK; len -= SHA_CBLOCK; } c->num=len; if (len != 0) { memcpy (p,data,len); } return; } void SHA256_Final (unsigned char *md, SHA256_CTX *c) unsigned char *p = c->data; size_t n = c->num; SHA_LONG cNl,cNh; p[n] = 0x80; /* there is always room for one */ n++; if (n > (SHA_CBLOCK-8)) { memset (p+n,0,SHA_CBLOCK-n); n=0; sha256_block_data_order (c,p); } memset (p+n,0,SHA_CBLOCK-8-n); p += SHA_CBLOCK-8; cNh=c->Nh; (void)HOST_l2c(cNh,p);

ACM Transactions on Programming Languages and Systems, to appear 2015

{

28

Andrew W. Appel

cNl=c->Nl; (void)HOST_l2c(cNl,p); p -= SHA_CBLOCK; sha256_block_data_order (c,p); c->num=0; memset (p,0,SHA_CBLOCK); {unsigned long ll; unsigned int xn; for (xn=0;xnh[xn]; HOST_l2c(ll,md); } return;

}

} void SHA256(const unsigned char *d, size_t n, unsigned char *md) { SHA256_CTX c; SHA256_Init(&c); SHA256_Update(&c,d,n); SHA256_Final(md,&c); return; }

B. THE SPECIFICATION

Definition big-endian-integer (contents: Z → int) : int := Int.or (Int.shl (contents 0) (Int.repr 24)) (Int.or (Int.shl (contents 1) (Int.repr 16)) (Int.or (Int.shl (contents 2) (Int.repr 8)) (contents 3))). Definition LBLOCKz : Z := 16. (∗ length of a block, in 32-bit ints ∗) Definition CBLOCKz : Z := 64. (∗ length of a block, in characters ∗) Definition s256state := (list val ∗ (val ∗ (val ∗ (list val ∗ val))))%type. Definition s256-h (s: s256state) := fst s. Definition s256-Nl (s: s256state) := fst (snd s). Definition s256-Nh (s: s256state) := fst (snd (snd s)). Definition s256-data (s: s256state) := fst (snd (snd (snd s))). Definition s256-num (s: s256state) := snd (snd (snd (snd s))). Inductive s256abs := (∗ SHA-256 abstract state ∗) S256abs: ∀ (hashed: list int) (∗ words hashed, so far ∗) (data: list Z), (∗ bytes in the partial block not yet hashed ∗) s256abs. Definition s256a-regs (a: s256abs) : list int := match a with S256abs hashed data ⇒ hash-blocks init-registers hashed end. Definition s256a-len (a: s256abs) : Z := match a with S256abs hashed data ⇒ (Zlength hashed ∗ 4 + Zlength data) ∗ 8 end%Z. ACM Transactions on Programming Languages and Systems, to appear 2015

Verification of a Cryptographic Primitive: SHA-256

Definition hilo hi lo := (Int.unsigned hi ∗ Int.modulus + Int.unsigned lo)%Z. Definition isbyteZ (i: Z) := (0