VIEWS: 197 PAGES: 88 CATEGORY: Software POSTED ON: 3/9/2011
Formal Methods and Computer Security
Formal Methods and Computer Security John Mitchell Stanford University Invitation • I'd like to invite you to speak about the role of formal methods in computer security. • This audience is … on the systems end … • If you're interested, let me know and we can work out the details. Outline What’s a “formal method”? Java bytecode verification Protocol analysis • Model checking • Protocol logic Trust management • Access control policy language Big Picture Biggest problem in CS • Produce good software efficiently Best tool • The computer Therefore • Future improvements in computer science/industry depend on our ability to automate software design, development, and quality control processes Formal method Analyze a system from its description • Executable code • Specification (possibly not executable) based on correspondence Analysis between system description and properties of interest • Semantics of code • Semantics of specification language Example: TCAS [Levison, Dill, …] Specification • Many pages of logical formulas specifying how TCAS responds to sensor inputs Analysis • If module satisfies specification, and aircraft proceeds as directed, then no collisions will occur Method • Logical deduction, based on formal rules Formal methods: good and bad Strengths • Formal rules captures years of experience • Precise, can be automated Weaknesses • Some subtleties are hard to formalize • Methods cumbersome, time consuming Formal methods sweet spot Users * Importance Worthwhile Not feasible Not worth the effort Multiplier parity OS verification System complexity * Property complexity Target areas Hardware verification Program verification • Prove properties of programs • Requirements capture and analysis • Type checking and “semantic analysis” Computer security • Mobile code security • Protocol analysis • Access control policy languages, analysis Computer Security Goal: protect computer systems and digital Security information Access control Network security OS security Web browser/server Crypto Database/application … Current formal methods use abstract view of cryptography Mobile code: Java Applet Localwindow Download • Seat map • Airline data Local data • User profile • Credit card Transmission • Select seat • Encrypted msg Java Virtual Machine Architecture Java A.java A.class Compiler Compile source code Java Virtual Machine Loader Verifier B.class Linker Bytecode Interpreter Java Sandbox Four complementary mechanisms • Class loader – Separate namespaces for separate class loaders – Associates protection domain with each class • Verifier and JVM run-time tests – NO unchecked casts or other type errors, NO array overflow – Preserves private, protected visibility levels • Security Manager – Called by library functions to decide if request is allowed – Uses protection domain associated with code, user policy – Enforcement uses stack inspection Verifier Bytecode may not come from standard compiler • Evil hacker may write dangerous bytecode Verifier checks correctness of bytecode • Every instruction must have a valid operation code • Every branch instruction must branch to the start of some other instruction, not middle of instruction • Every method must have a structurally correct signature • Every instruction obeys the Java type discipline Last condition is fairly complicated . How do we know verifier is correct? Many attacks based on verifier errors Formal studies prove correctness • Abadi and Stata • Freund and Mitchell • Nipkow and others … A type system for object initialization in the Java bytecode language Stephen Freund John Mitchell Stanford University (Raymie Stata and Martín Abadi, DEC SRC) Bytecode/Verifier Specification Specifications from Sun/JavaSoft: • 30 page text description [Lindholm,Yellin] • Reference implementation (~3500 lines of C code) These are vague and inconsistent Difficult to reason about: • safety and security properties • correctness of implementation Type system provides formal spec JVM uses stack machine Java JVM Activation Record Class A extends Object { int i void f(int val) { i = val + 1;} local } variables Bytecode Method void f(int) aload 0 ; object ref this operand iload 1 ; int val stack iconst 1 iadd ; add val +1 putfield #4 <Field int i> Return addr, data exception info, return area Const pool res. refers to const pool Java Object Initialization Point p = new Point(3); p.print(); 1: new Point 2: dup 3: iconst 3 4: invokespecial <method Point(int)> 5: invokevirtual <method print()> No easy pattern to match Multiple refs to same uninitialized object JVMLi Instructions Abstract instructions: • new allocate memory for object • init initialize object • use use initialized object Goal • Prove that no object can be used before it has been initialized Typing Rules For program P, compute for iDom(P) Fi : Var type type of each variable Si : stack of types type of each stack location Example: static semantics of inc P[i] = inc Fi+1 = Fi Si+1 = Si = Int i+1 Dom(P) F, S, i P Typing Rules Each rule constrains successors of instruction: Well-typed = Accepted by Verifier Alias Analysis Other situations: 1: new P or new P 2: new P 3: init P init P Equivalenceclasses based on line where object was created. The new Instruction object type placed on Uninitialized stack of types: P[i] = new i : uninitialized object of Fi+1 = Fi type allocated on line i. Si+1 = i Si i Si i Range(Fi) i+1 Dom(P) F, S, i P The init Instruction Substitution of initialized object type for uninitialized object type: P[i] = init Si = j , j Dom(P) Si+1 =[/ j] Fi+1 =[/ j] Fi i+1 Dom(P) F, S, i P Soundness Theorem: A well-typed program will not generate a run-time error when executed Invariant: • During program execution, there is never more than one value of type present. • If this is violated, we could initialize one object and mistakenly believe that a different object was also initialized. Extensions Constructors • constructor must call superclass constructor PrimitiveTypes and Basic Operations Subroutines [Stata,Abadi] • jsr L jump to L and push return address on stack • ret x jump to address stored in x • polymorphic over untouched variables Dom(FL) restricted to variables used by subroutine Bug in Sun JDK 1.1.4 1: jsr 10 2: store 1 3: jsr 10 4: store 2 5: load 2 variables 1 and 2 contain references to two different objects with type P11 . 6: init P 7: load 1 8: use P verifier allows use of uninitialized object 9: halt 10: store 0 11: new P 12: ret 0 Related Work Java type systems • Java Language [DE 97], [Syme 97], ... • JVML [SA 98], [Qian 98], [HT 98], ... Other approaches • Concurrent constraint programs [Saraswat 97] • defensive-JVM [Cohen 97] • data flow analysis frameworks [Goldberg 97] • Experimental tests [SMB 97] TIL / TAL [Harper,Morrisett,et al.] Protocol Security Cryptographic Protocol • Program distributed over network • Use cryptography to achieve goal Attacker • Read, intercept, replace messages, remember their contents Correctness • Attacker cannot learn protected secret or cause incorrect protocol completion Example Protocols Authentication Protocols • Clark-Jacob report >35 examples (1997) • ISO/IEC 9798, Needham-S, Denning- Sacco, Otway-Rees, Woo-Lam, Kerberos Handshake and data transfer • SSL, SSH, SFTP, FTPS, … Contractsigning, funds transfer, … Many others Characteristics Relatively simple distributed programs • 5-7 steps, 3-10 fields per message, … Mission critical • Security of data, credit card numbers, … Subtle • Attack may combine data from many sessions Good target for formal methods However: crypto is hard to model Run of protocol Initiate B Respond A Attacker C D Correct if no security violation in any run Protocol Analysis Methods Non-formal approaches (useful, but no tools…) • Some crypto-based proofs [Bellare, Rogaway] • Communicating Turing Machines [Canetti] BAN and related logics • Axiomatic semantics of protocol steps Methods based on operational semantics • Intruder model derived from Dolev-Yao • Protocol gives rise to set of traces – Denotation of protocol = set of runs involving arbitrary number of principals plus intruder Example projects and tools Prove protocol correct • Paulson’s “Inductive method”, others in HOL, PVS, • MITRE -- Strand spaces • Process calculus approach: Abadi-Gordon spi-calculus Search using symbolic representation of states • Meadows: NRL Analyzer, Millen: CAPSL Exhaustive finite-state analysis • FDR, based on CSP [Lowe, Roscoe, Schneider, …] • Clarke et al. -- search with axiomatic intruder model Protocol analysis spectrum Hand proofs High Sophistication of attacks Poly-time calculus Multiset rewriting with Spi-calculus Athena Paulson NRL Bolignano BAN logic Low Model checking Protocol logic FDR Murj Low High Protocol complexity Important Modeling Decisions How powerful is the adversary? • Simple replay of previous messages • Block messages; Decompose, reassemble, resend • Statistical analysis, traffic analysis • Timing attacks How much detail in underlying data types? • Plaintext, ciphertext and keys – atomic data or bit sequences • Encryption and hash functions – “perfect” cryptography – algebraic properties: encr(x*y) = encr(x) * encr(y) for RSA encrypt(k,msg) = msgk mod N Four efforts (w/various collaborators) Finite-state analysis • Case studies: find errors, debug specifications Logic based model - Multiset rewriting • Identify basic assumptions • Study optimizations, prove correctness • Complexity results Framework with probability and complexity • More realistic intruder model • Interaction between protocol and cryptography • Significant mathematical issues, similar to hybrid systems (Panangaden, Jagadeesan, Alur, Henzinger, de Alfaro, …) Protocol logic Rest of talk Model checking • Contract signing MSR • Overview, complexity results PPoly • Key definitions, concepts Protocol logic • Short overview Likely to run out of time … Contract-signing protocols John Mitchell, Vitaly Shmatikov Stanford University Subsequent work by Chadha, Kanovich, Scedrov, Other analysis by Kremer, Raskin Example Immunity deal Both parties want to sign the contract Neither wants to commit first General protocol outline I am going to sign the contract I am going to sign the contract A Here is my signature B Here is my signature Trusted third party can force contract • Third party can declare contract binding if presented with first two messages. Assumptions Cannot trust communication channel • Messages may be lost • Attacker may insert additional messages Cannot trust other party in protocol Third party is generally reliable • Use only if something goes wrong • Want TTP accountability Desirable properties Fair • If one can get contract, so can other Accountability • If someone cheats, message trace shows who cheated Abuse free • No party can show that they can determine outcome of the protocol Asokan-Shoup-Waidner protocol Agree Abort m1= sign(A, c, hash(r_A) ) A B sign(B, m1, hash(r_B) ) a1 ??? Network A r_A B r_B If not already sigT (a1,abort) T resolved Resolve m1 Attack? m2 A A Net B ??? sigT (m1, m2) T T Results Exhaustive finite-state analysis • Two signing parties, third party • Attacker tries to subvert protocol Two attacks • Replay attack – Restart A’s conversation to fool B • Inconsistent signatures – Both get contracts, but with different ID’s Repair • Add data to m3, m4; prevent both attacks Related protocol [Garay, Jakobsson, MacKenzie] Designed to be “abuse free” • B cannot take msg from A and show to C • Uses special cryptographic primitive • T converts signatures, does not use own Finite-state analysis • Attack gives A both contract and abort • T colludes weakly, not shown accountable • Simple repair using same crypto primitive Garay, Jakobsson, MacKenzie Agree Abort m1 = PCSA(text,B,T) PCSA(text,B,T) A B PCSB(text,A,T) ??? Network A sigA(text) B sigB(text ) T Resolve PCS Attack A(text,B,T) PCSB(text,A,T) B A Net B sigT(abort) ??? T Leaked by T T PCSA(text,B,T) abort AND sigB(text) sigB(text) abort Modeling Abuse-Freeness Ability to determine the outcome Abuse = + Ability to prove it Not a trace property! Depend on set of traces through a state Approximation for finite-state analysis • Nondet. challenge A to resolve or abort • If trace s.t. outcome challenge, then A cannot determine the outcome Conclusions Online contract signing is subtle • Fairness • Abuse-freeness • Accountability Several interdependent subprotocols • Many cases and interleavings Finite-state tool great for case analysis! • Find bugs in protocols proved correct Multiset Rewriting and Security Protocol Analysis John Mitchell Stanford University I. Cervesato, N. Durgin, P. Lincoln, A. Scedrov A notation for inf-state systems Linear Logic Proof search () (Horn clause) Multiset rewriting Process Finite Automata Calculus • Many previous models are buried in tools • Define common model in tool-independent formalism Modeling Requirements Express properties of protocols • Initialization – Principals and their private/shared data • Nonces – Generate fresh random data Model attacker • Characterize possible messages by attacker • Cryptography Set of runs of protocol under attack Notation commonly found in literature A B : { A, Noncea }Kb B A : { Noncea, Nonceb }Ka A B : { Nonceb }Kb • The notation describes protocol traces • Does not – specify initial conditions – define response to arbitrary messages – characterize possible behaviors of attacker Rewriting Notation Non-deterministic infinite-state systems Facts F ::= P(t1, …, tn) Multi-sorted first-order t ::= x | c | f(t1, …, tn) atomic formulas States { F1, ..., Fn } • Multiset of facts – Includes network messages, private state – Intruder will see messages, not private state Rewrite rules Transition • F1, …, Fk x1 … xm. G1, … , Gn What this means • If F1, …, Fk in state , then a next state ’ has – Facts F1, …, Fk removed – G1, … , Gn added, with x1 … xm replaced by new symbols – Other facts in state carry over to ’ • Free variables in rule universally quantified Note • Pattern matching in F1, …, Fk can invert functions • Linear Logic: F1…Fk x1 … xm(G1…Gn) Finite-State Example a a q1 b a q0 b q3 b a b • Predicates: State, Input q2 b • Function: • Constants: q0, q1, q2, q3, a, b, nil • Transitions: State(q0), Input(a x) State(q1), Input(x) State(q0), Input(b x) State(q2), Input(x) ... Set of rewrite transition sequences = set of runs of automaton Simplified Needham-Schroeder Predicates A B: {na, A}Kb Ai, Bi, Ni -- Alice, Bob, Network in state i B A: {na, nb}Ka Transitions A B: {nb}Kb x. A1(x) A1(x) N1(x), A2(x) N1(x) y. B1(x,y) B1(x,y) N2(x,y), B2(x,y) Authentication A2(x), N2(x,y) A3(x,y) A4(x,y) B3(x,y’) y=y’ A3(x,y) N3(y), A4(x,y) B2(x,y), N3(y) B3(x,y) picture next slide A B: {na, A}Kb B A: {na, nb}Ka Sample Trace A B: {nb}Kb x. A1(x) A1(na) A1(x) A2(x), N1(x) N1(na) A2(na) N1(x) y. B1(x,y) A2(na) B1(na, nb) B1(x,y) N2(x,y), B2(x,y) N2(na, nb) A2(na) B2(na, nb) A2(x), N2(x,y) A3(x,y) A3(na, nb) B2(na, nb) A3(x,y) N3(y), A4(x,y) N3( nb) A4(na, nb) B2(na, nb) B2(x,y), N3(y) B3(x,y) A4(na, nb) B3(na, nb) Common Intruder Model Derived from Dolev-Yao model • Adversary is nondeterministic process • Adversary can – Block network traffic – Read any message, decompose into parts – Decrypt if key is known to adversary – Insert new message from data it has observed • Adversary cannot – Gain partial knowledge – Guess part of a key – Perform statistical tests, … Formalize Intruder Model Intercept, decompose and remember messages N1(x) M(x) N2(x,y) M(x), M(y) N3(x) M(x) Decrypt if key is known M(enc(k,x)), M(k) M(x) Compose and send messages from “known” data M(x) N1(x), M(x) M(x), M(y) N2(x,y), M(x), M(y) M(x) N3(x), M(x) Generate new data as needed x. M(x) Highly nondeterministic, same for any protocol Attack on Simplified Protocol x. A1(x) A1(na) N1(na) A1(x) A2(x), N1(x) A2(na) N1(x) M(x) A2(na) M(na) x. M(x) A2(na) M(na), M(na’) N1(na’) M(x) N1(x), M(x) A2(na) M(na), M(na’) A2(na) M(na), M(na’) B1(na’, nb) N1(x) y. B1(x,y) Continue “man-in-the-middle” to violate specification Protocols vs Rewrite rules Can axiomatize any computational system But -- protocols are not arbitrary programs Choose principals Select roles Client Client TGS Server Thesis: MSR Model is accurate Captures “Dolev-Yao-Needham-Millen-Meadows- …” model • MSR defines set of traces protocol and attacker • Connections with approach in other formalisms Useful for protocol analysis • Errors shown by model are errors in protocol • If no error appears, then no attack can be carried out using only the actions allowed by the model Complexity results using MSR Bounded # Bounded Unbounded of roles use of use of Intruder , ?? with only NP – DExp – Intruder , Undecidable complete time w/o only All: Finite number of different roles, each role of finite length, bounded message size Key insight: existential quantification () captures cryptographic nonce; main source of complexity [Durgin, Lincoln, Mitchell, Scedrov] Additional decidable cases Bounded role instances, unbounded msg size • Huima 99: decidable • Amadio, Lugiez: NP w/ atomic keys • Rusinowitch, Turuani: NP-complete, composite keys • Other studies, e.g., Kusters: unbounded # data fields Constraint systems • Cortier, Comon: Limited equality test • Millen, Shmatikov: Finite-length runs All: bound number of role instances Probabilistic Polynomial-Time Process Calculus for Security Protocol Analysis J. Mitchell, A. Ramanathan, A. Scedrov, V. Teague P. Lincoln, M. Mitchell Limitations of Standard Model Can find some attacks • Successful analysis of industrial protocols Other attacks are outside model • Interaction between protocol and encryption Some protocols cannot be modeled • Probabilistic protocols • Steps that require specific property of encryption Possible to “OK” an erroneous protocol Non-formal state of the art Turing-machine-based analysis • Canetti • Bellare, Rogaway • Bellare, Canetti, Krawczyk • others … Provecorrectness of protocol transformations • Example: secure channel -> insecure channel Language Approach [Abadi, Gordon] Write protocol in process calculus Express security using observational equivalence • Standard relation from programming language theory P Q iff for all contexts C[ ], same observations about C[P] and C[Q] • Context (environment) represents adversary Use proof rules for to prove security • Protocol is secure if no adversary can distinguish it from some idealized version of the protocol Probabilistic Poly-time Analysis [Lincoln, Mitchell, Mitchell, Scedrov] Adopt spi-calculus approach, add probability Probabilistic polynomial-time process calculus • Protocols use probabilistic primitives – Key generation, nonce, probabilistic encryption, ... • Adversary may be probabilistic • Modal type system guarantees complexity bounds Express protocol and specification in calculus Study security using observational equivalence • Use probabilistic form of process equivalence Needham-Schroeder Private Key Analyze part of the protocol P AB: {i}K B A : { f(i) } K “Obviously’’ secret protocol Q (zero knowledge) A B : { random_number } K B A : { random_number } K Analysis: P Q reduces to crypto condition related to non-malleability [Dolev, Dwork, Naor] – Fails for RSA encryption, f(i) = 2i Technical Challenges Language for prob. poly-time functions • Extend Hofmann language with rand Replace nondeterminism with probability • Otherwise adversary is too strong ... Define probabilistic equivalence • Related to poly-time statistical tests ... Develop specification by equivalence • Several examples carried out Proof systems for probabilistic equivalence • Work in progress Basic example Sequence generated from random seed Pn: let b = nk-bit sequence generated from n random bits in PUBLIC b end Truly random sequence Qn: let b = sequence of nk random bits in PUBLIC b end P is crypto strong pseudo-random generator PQ Equivalence is asymptotic in security parameter n Compositionality Property of observational equiv AB CD A|C B|D similarly for other process forms Current State of Project New framework for protocol analysis • Determine crypto requirements of protocols ! • Precise definition of crypto primitives Probabilistic ptime language Pi-calculus-like process framework • replaced nondeterminism with rand • equivalence based on ptime statistical tests Proofmethods for establishing equivalence Future: tool development Protocol logic Protocol Honest Principals, Attacker Private Data Alice’s information • Protocol • Private data • Sends and receives Intuition Reason about local information • I chose a new number • I sent it out encrypted • I received it decrypted • Therefore: someone decrypted it Incorporate knowledge about protocol • Protocol: Server only sends m if it got m’ • If server not corrupt and I receive m signed by server, then server received m’ Bidding conventions (motivation) Blackwood response to 4NT – 5 : 0 or 4 aces – 5 : 1 ace – 5 : 2 aces – 5 : 3 aces Reasoning • If my partner is following Blackwood, then if she bid 5, she must have 2 aces Logical assertions Modal operator • [ actions ] P - after actions, P reasons Predicates in • Sent(X,m) - principal X sent message m • Created(X,m) – X assembled m from parts • Decrypts(X,m) - X has m and key to decrypt m • Knows(X,m) - X created m or received msg containing m and has keys to extract m from msg • Source(m, X, S) – YX can only learn m from set S • Honest(X) – X follows rules of protocol Correctness of NSL Bob knows he’s talking to Alice [ recv encrypt( Key(B), A,m ); new n; msg1 send encrypt( Key(A), m, B, n ); recv encrypt( Key(B), n ) ]B msg3 Honest(A) Csent(A, msg1) Csent(A, msg3) where Csent(A, …) Created(A, …) Sent(A, …) Honesty rule (rule scheme) roles R of Q. initial segments A R. Q |- [ A ]X Q |- Honest(X) • This is a finitary rule: – Typical protocol has 2-3 roles – Typical role has 1-3 receives – Only need to consider A waiting to receive Conclusions Security Protocols • Subtle, Mission critical, Prone to error Analysis methods • Model checking – Practically useful; brute force is a good thing – Limitation: find errors in small configurations • Proof methods – Time-consuming to use general logics – Special-purpose logics can be sound, useful Room for another 5+ years of work Access Control / Trust Mgmt Root CA Conference Registration Stanford is accred university Regular $1000 Academic $500 Stanford Student $100 Mitchell is regular faculty Registration message Faculty can ident students Mitchell Certification Chander is my student Root signs: Stanford is accred university Stanford signs: Mitchell is regular faculty Faculty can ident students Chander Mitchell signs: Chander is my student Formal methods Users * Importance Worthwhile Not feasible Not worth the effort System complexity * Property complexity System-Property Tradeoff Hand proofs High Sophistication of attacks Poly-time calculus Multiset rewriting with Spi-calculus Athena Paulson NRL Bolignano BAN logic Low Model checking Protocol logic FDR Murj Low High Protocol complexity The Wedge of Formal Verification Value to Design Verify Abstract Refute Invisible FM Effort Invested Big Picture Biggest problem in CS • Produce good software efficiently Best tool • The computer Therefore • Future improvements in computer science/industry depend on our ability to automate software design, development, and quality control processes