VIEWS: 2 PAGES: 43 POSTED ON: 2/21/2013 Public Domain
Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India Recap: A basic pointer analysis algorithm 1 S1 = [x -> {null}, y -> {null}, p -> {null},…] x = &a 2 S2 = AS[x = &a] S1 S2 = S1 [x -> {a}] y=x 3 S3 = AS[y = x] S2 S3 = S2 [y -> S2(x)] p = &x p = &y 4 5 … skip skip 6 *x = &c 7 … *p = &c 8 Abstract Transformers • AS[stmt] : AbsDataState -> AbsDataState • AS[ x = y ] s = s[x s(y)] • AS[ x = null ] s = s[x {null}] • AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) … s(vn) Abstract Transformers AS[stmt] : AbsDataState -> AbsDataState AS[ *x = y ] s = Andersen’s Analysis • A flow-insensitive analysis – computes a single points-to solution valid at all program points – ignores control-flow – treats program as a set of statements – equivalent to merging all vertices into one (and applying algorithm A) – equivalent to adding an edge between every pair of vertices (and applying algo. A) – a solution R: Vars -> 2Vars’ such that R IdealMayPT(u) for every vertex u Example (Flow-Sensitive Analysis) 1 x = &a; x = &a 2 y = x; y=x 3 x = &b; x = &b 4 z=x z = x; 5 Example: Andersen’s Analysis 1 x = &a; x = &a 2 y = x; y=x 3 x = &b; x = &b 4 z=x z = x; 5 Andersen’s Analysis • Strong updates? • Initial state? Why Flow-Insensitive Analysis? • Reduced space requirements – a single points-to solution • Reduced time complexity – no copying • individual updates more efficient – no need for joins – number of iterations? – a cubic-time algorithm • Scales to millions of lines of code – most popular points-to analysis Andersen’s Analysis A Set-Constraints Formulation • Compute PTx for every variable x Statement Constraint x = null x = &y x=y x = *y *x = y Steensgaard’s Analysis • Unification-based analysis • Inspired by type inference – an assignment “lhs := rhs” is interpreted as a constraint that lhs and rhs have the same type – the type of a pointer variable is the set of variables it can point-to • “Assignment-direction-insensitive” – treats “lhs := rhs” as if it were both “lhs := rhs” and “rhs := lhs” • An almost-linear time algorithm – single-pass algorithm; no iteration required Example: Andersen’s Analysis 1 x = &a; x = &a 2 y = x; y=x 3 y = &b; y = &b 4 b = &c b = &c; 5 Example: Steensgaard’s Analysis 1 x = &a; x = &a 2 y = x; y=x 3 y = &b; y = &b 4 b = &c b = &c; 5 Steensgaard’s Analysis • Can be implemented using Union- Find data-structure • Leads to an almost-linear time algorithm Exercise x = &a; y = x; y = &b; b = &c; *x = &d; May-Point-To Analyses Ideal-May-Point-To ??? Algorithm A more efficient / less precise Andersen’s more efficient / less precise Steensgaard’s Ideal Points-To Analysis: Definition Recap • A sequence of states s1s2 … sn is said to be an execution (of the program) iff – s1 is the Initial-State – si | si+1 for 1 <= I < n • A state s is said to be a reachable state iff there exists some execution s1s2 … sn is such that sn = s. • RS(u) = { s | (u,s) is reachable } • IdealMayPT (u) = { (p,x) | $ s RS(u). s(p) == x } • IdealMustPT (u) = { (p,x) | " s RS(u). s(p) == x } Does Algorithm A Compute The Most Precise Solution? Ideal <-> Algorithm A • Abstract away correlations between variables x: &y y: &x – relational analysis vs. x: &y y: &z – independent attribute x: &b y: &x x: &b y: &z x: {&y,&b} y: {&x,&z} x: &b y: &x x: &y y: &z Does Algorithm A Compute The Most Precise Solution? Is The Precise Solution Computable? • Claim: The set RS(u) of reachable concrete states (for our language) is computable. • Note: This is true for any collecting semantics with a finite state space. Computing RS(u) Precise Points-To Analysis: Decidability • Corollary: Precise may-point-to analysis is computable. • Corollary: Precise (demand) may-alias analysis is computable. – Given ptr-exp1, ptr-exp2, and a program point u, identify if there exists some reachable state at u where ptr-exp1 and ptr-exp2 are aliases. • Ditto for must-point-to and must-alias • … for our restricted language! Precise Points-To Analysis: Computational Complexity • What’s the complexity of the least-fixed point computation using the collecting semantics? • The worst-case complexity of computing reachable states is exponential in the number of variables. – Can we do better? • Theorem: Computing precise may-point-to is PSPACE-hard even if we have only two-level pointers. May-Point-To Analyses Ideal-May-Point-To more efficient / less precise Algorithm A more efficient / less precise Andersen’s more efficient / less precise Steensgaard’s Precise Points-To Analysis: Caveats • Theorem: Precise may-alias analysis is undecidable in the presence of dynamic memory allocation. – Add “x = new/malloc ()” to language – State-space becomes infinite • Digression: Integer variables + conditional-branching also makes any precise analysis undecidable. May-Point-To Analyses Ideal (with Int, with Malloc) Ideal (with Int) Ideal (with Malloc) Ideal (no Int, no Malloc) Algorithm A Andersen’s Steensgaard’s Dynamic Memory Allocation • s: x = new () / malloc () • Assume, for now, that allocated object stores one pointer – s: x = malloc ( sizeof(void*) ) • Introduce a pseudo-variable Vs to represent objects allocated at statement s, and use previous algorithm – treat s as if it were “x = &Vs” – also track possible values of Vs – allocation-site based approach • Key aspect: Vs represents a set of objects (locations), not a single object – referred to as a summary object (node) Dynamic Memory Allocation: Example 1 x = new; x = new 2 y = x; y=x 3 *y = &b; *y = &b 4 *y = &a *y = &a; 5 Dynamic Memory Allocation: Summary Object Update 4 *y = &a 5 Dynamic Memory Allocation: Object Fields • Field-sensitive analysis class Foo { A* f; B* g; } s: x = new Foo() x->f = &b; x->g = &a; Dynamic Memory Allocation: Object Fields • Field-insensitive analysis class Foo { A* f; B* g; } s: x = new Foo() x->f = &b; x->g = &a; Interpreting Branch Conditions Conditional Control-Flow (In The Concrete Semantics) • Encoding conditional-control-flow – using “assume” statements 1 if (P) then assume P assume !P S1; else 2 4 S2; S1 S2 3 5 endif Conditional Control-Flow (In The Concrete Semantics) • Semantics of “assume” statements – DataState -> {true,false} 1 if (P) then assume P assume !P S1; else 2 4 S2; S1 S2 3 5 endif Abstracting “assume” statements 1 if (x != null) then assume (x != null) y = x; assume (x == null) else 2 4 y=x S2 … 3 5 endif Abstracting “assume” statements 2 assume x == y 3 Other Aspects • Context-sensitivity • Indirect (virtual) function calls and call-graph construction • Pointer arithmetic • Object-sensitivity Andersen’s Analysis: Further Optimizations and Extensions • Fahndrich et al., Partial online cycle elimination in inclusion constraint graphs, PLDI 1998. • Rountev and Chandra, Offline variable substitution for scaling points-to analysis, 2000. • Heintze and Tardieu, Ultra-fast aliasing analysis using CLA: a million lines of C code in a second, PLDI 2001. • M. Hind, Pointer analysis: Haven’t we solved this problem yet?, PASTE 2001. • Hardekopf and Lin, The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code, PLDI 2007. • Hardekopf and Lin, Exploiting pointer and location equivalence to optimize pointer analysis, SAS 2007. • Hardekopf and Lin, Semi-sparse flow-sensitive pointer analysis, POPL 2009. Andersen’s Analysis: Further Optimizations • Cycle Elimination – Offline – Online • Pointer Variable Equivalence Context-Sensitivity Etc. • Liang & Harrold, Efficient computation of parameterized pointer information for interprocedural analyses. SAS 2001. • Lattner et al., Making context-sensitive points-to analysis with heap cloning practical for the real world, PLDI 2007. • Zhu & Calman, Symbolic pointer analysis revisited. PLDI 2004. • Whaley & Lam, Cloning-based context-sensitive pointer alias analysis using BDD, PLDI 2004. • Rountev et al. Points-to analysis for Java using annotated constraints. OOPSLA 2001. • Milanova et al. Parameterized object sensitivity for points-to and side-effect analyses for Java. ISSTA 2002. Applications • Compiler optimizations • Verification & Bug Finding – use in preliminary phases – use in verification itself Questions?