Pointer Analysis Lecture 2 by FitFittington

VIEWS: 12 PAGES: 44

									 Pointer Analysis
       Lecture 2



      G. Ramalingam
Microsoft Research, India
        Andersen’s Analysis

• A flow-insensitive analysis
  – computes a single points-to solution valid at
    all program points
  – ignores control-flow – treats program as a
    set of statements
  – equivalent to merging all vertices into one
    (and applying algorithm A)
  – equivalent to adding an edge between every
    pair of vertices (and applying algo. A)

  – a solution R: Vars -> 2Vars’ such that
       R r IdealMayPT(u) for every vertex u
             Example
    (Flow-Sensitive Analysis)
           1
x = &a;        x = &a
           2
y = x;      y=x
           3

x = &b;     x = &b
           4

z = x;         z=x
           5
                Example:
          Andersen’s Analysis
               1
x = &a;            x = &a
              2
y = x;         y=x
              3

x = &b;        x = &b
              4

z = x;             z=x
              5
      Andersen’s Analysis
• Strong updates?

• Initial state?
 Why Flow-Insensitive Analysis?
• Reduced space requirements
  – a single points-to solution
• Reduced time complexity
  – no copying
    • individual updates more efficient
  – no need for joins
  – number of iterations?
  – a cubic-time algorithm
• Scales to millions of lines of code
  – most popular points-to analysis
      Andersen’s Analysis
 A Set-Constraints Formulation
• Compute PTx for every variable x
    Statement        Constraint


   x = null


   x = &y


   x=y


   x = *y


   *x = y
    Steensgaard’s Analysis
• Unification-based analysis
• Inspired by type inference
  – an assignment “lhs := rhs” is interpreted as
    a constraint that lhs and rhs have the same
    type
  – the type of a pointer variable is the set of
    variables it can point-to
• “Assignment-direction-insensitive”
  – treats “lhs := rhs” as if it were both “lhs
    := rhs” and “rhs := lhs”
• An almost-linear time algorithm
  – single-pass algorithm; no iteration required
                Example:
          Andersen’s Analysis
               1
x = &a;            x = &a
              2
y = x;         y=x
              3

y = &b;        y = &b
              4

b = &c;            b = &c
              5
                Example:
         Steensgaard’s Analysis
               1
x = &a;            x = &a
               2
y = x;         y=x
               3

y = &b;        y = &b
               4

b = &c;            b = &c
               5
    Steensgaard’s Analysis
• Can be implemented using Union-Find
  data-structure
• Leads to an almost-linear time
  algorithm
May-Point-To Analyses
    Ideal-May-Point-To
                ???


         Algorithm A

    more efficient / less precise


         Andersen’s
    more efficient / less precise


       Steensgaard’s
     Ideal Points-To Analysis:
          Definition Recap
• A sequence of states s1s2 „ sn is said to be an
  execution (of the program) iff
   – s1 is the Initial-State
   – si | si+1 for 1 <= I < n
• A state s is said to be a reachable state iff there
  exists some execution s1s2 „ sn is such that sn = s.
• RS(u) = { s | (u,s) is reachable }
• IdealMayPT (u) = { (p,x) | $ s  RS(u). s(p) == x }
• IdealMustPT (u) = { (p,x) | " s  RS(u). s(p) == x
  }
    The Collecting Semantics
     & Precise Points-To Analysis
• Claim: The set of reachable concrete
  states (for our language) is
  computable.

• Note: This is true for any collecting
  semantics with a finite state space.
   Precise Points-To Analysis:
          Decidability
• Corollary: Precise may-point-to analysis is
  computable.

• Corollary: Precise (demand) may-alias analysis
  is computable.
  – Given ptr-exp1, ptr-exp2, and a program point u,
    identify if there exists some reachable state at
    u where ptr-exp1 and ptr-exp2 are aliases.

• Ditto for must-point-to and must-alias

• „ for our restricted language!
   Precise Points-To Analysis:
    Computational Complexity
• What’s the complexity of the least-fixed
  point computation using the collecting
  semantics?

• The worst-case complexity of computing
  reachable states is exponential in the number
  of variables.
  – Can we do better?

• Theorem: Computing precise may-point-to is
  PSPACE-hard even if we have only two-level
  pointers.
May-Point-To Analyses
    Ideal-May-Point-To
    more efficient / less precise


         Algorithm A

    more efficient / less precise


         Andersen’s
    more efficient / less precise


       Steensgaard’s
       Ideal <-> Algorithm A
              • Abstract away correlations
                between variables
x: &y y: &x     – relational analysis vs.
x: &y y: &z     – independent attribute

x: &b y: &x
x: &b y: &z
                  x: {&y,&b}   y: {&x,&z}


x: &b y: &x

x: &y y: &z
 Precise Points-To Analysis?
• Theorem: Precise may-alias analysis is
  undecidable in the presence of dynamic
  memory allocation.
  – Add “x = new/malloc ()” to language
  – State-space becomes infinite

• Digression: Integer variables +
  conditional-branching also makes any
  precise analysis undecidable.
  May-Point-To Analyses
      Ideal (with Int, with Malloc)


Ideal (with Int)       Ideal (with Malloc)


        Ideal (no Int, no Malloc)


               Algorithm A


               Andersen’s


              Steensgaard’s
     Dynamic Memory Allocation
• s: x = new () / malloc ()
• Assume, for now, that allocated object stores
  one pointer
  – s: x = malloc ( sizeof(void*) )
• Introduce a pseudo-variable Vs to represent
  objects allocated at statement s, and use
  previous algorithm
  – treat s as if it were “x = &Vs”
  – also track possible values of Vs
  – allocation-site based approach
• Key aspect: Vs represents a set of objects
  (locations), not a single object
  – referred to as a summary object (node)
   Dynamic Memory Allocation:
             Example
           1
x = new;       x = new
           2
y = x;     y=x
           3

*y = &b;   *y = &b
           4

*y = &a;       *y = &a
           5
Dynamic Memory Allocation:
   Summary Object Update



                    4
                    *y = &a
                    5
   Dynamic Memory Allocation:
         Object Fields
• Field-sensitive analysis
   class Foo {
      A* f;
      B* g;
   }
   s: x = new Foo()

  x->f = &b;

  x->g = &a;
   Dynamic Memory Allocation:
         Object Fields
• Field-insensitive analysis
   class Foo {
      A* f;
      B* g;
   }
   s: x = new Foo()

  x->f = &b;

  x->g = &a;
         Andersen’s Analysis:
  Further Optimizations and Extensions
• Fahndrich et al., Partial online cycle elimination in
  inclusion constraint graphs, PLDI 1998.
• Rountev and Chandra, Offline variable substitution for
  scaling points-to analysis, 2000.
• Heintze and Tardieu, Ultra-fast aliasing analysis using
  CLA: a million lines of C code in a second, PLDI 2001.
• M. Hind, Pointer analysis: Haven’t we solved this
  problem yet?, PASTE 2001.
• Hardekopf and Lin, The ant and the grasshopper: fast
  and accurate pointer analysis for millions of lines of
  code, PLDI 2007.
• Hardekopf and Lin, Exploiting pointer and location
  equivalence to optimize pointer analysis, SAS 2007.
• Hardekopf and Lin, Semi-sparse flow-sensitive pointer
  analysis, POPL 2009.
       Andersen’s Analysis:
       Further Optimizations
• Cycle Elimination
  – Offline
  – Online
• Pointer Variable Equivalence
          Other Aspects
• Context-sensitivity
• Indirect (virtual) function calls and
  call-graph construction
• Pointer arithmetic
• Object-sensitivity
          May-Point-To Analyses
      Ideal (with Int, with Malloc)


Ideal (with Int)        Ideal (with Malloc)



        Ideal (no Int, no Malloc)
                     abstract away branching conditions

Ideal (no Int, no Malloc, no Conditionals)
                     abstract away variable correlations
               Algorithm A

               Andersen’s

              Steensgaard’s
   Conditional-Control-Flow
• Encoding conditional-control-flow
  – using “assume” statements
  – semantics as a transition relation
    • useful for non-deterministic statements as
      well
     Conditional Control-Flow
      (In The Concrete Semantics)
• Encoding conditional-control-flow
  – using “assume” statements
                            1
 if (P) then     assume P       assume !P
    S1;
 else                2              4

    S2;               S1             S2
                     3              5
 endif
     Conditional Control-Flow
      (In The Concrete Semantics)
• Semantics of “assume” statements
  – DataState -> {true,false}



                                1
 if (P) then      assume P          assume !P
    S1;
 else                 2                 4

    S2;                S1                S2
                      3                 5
 endif
    Conditional Control-Flow
       (In The Concrete Semantics)
• Semantics of “assume” statements
  – DataState -> {true,false}
  – a transition relation on DataState
    •  DataState x DataState
    • DataState -> 2DataState
                                            1
  – collecting semantics
    • 2DataState -> 2DataState   assume P

                                        2
       Transition Relations
           In Semantics
• “Assume” statements correspond to
  special kind of transition relations
  – every state s is related to no state or
    just s
• Transition relations are useful for
  modeling other statements as well,
  especially non-determinism
  – “read(x)” modeled as “x := ?”
Abstracting Transition
      Relations
         Abstracting “assume”
               statements
                                      1
if (x != null) then   assume (x != null)
   y = x;                                  assume (x == null)
else                         2                     4
                              y=x                   S2
   …
                             3                     5
endif
    Abstracting “assume”
          statements

2


assume x == y

3
          May-Point-To Analyses
      Ideal (with Int, with Malloc)


Ideal (with Int)        Ideal (with Malloc)



        Ideal (no Int, no Malloc)
                     abstract away branching conditions

Ideal (no Int, no Malloc, no Conditionals)
                     abstract away variable correlations
               Algorithm A

               Andersen’s

              Steensgaard’s
Questions?
"$ 7 z ^ bt
  Abstracting Away Correlations
x: &b y: &x
                    a           x: {&y,&b}    y: {&x,&z}
x: &y y: &z

• Abstract set of ordered pairs by
  ordered pair of sets
   – a : 2VxV -> 2V x 2V
   – a(S) = ({ x | (x,y)  S },              {y | (x,y)
      S})
• Generic Galois Connection
   – works for any set V
              AxB       A   B
 Abstracting Away Correlations
• Similarly we can define
  •   a:   2AxBxC -> 2A x 2B x 2C
  •   a:   2AxAxA ->  2A x 2A x 2A
  •   a:   2Ak     ->   (2A)k
  •   a:   2V->A ->    V->(2A)
      Ideal <-> Algorithm A
• a : 2Var->A ->   Var->(2A) describes the
  abstraction of a set of data-states
• Abstracting a set of program-states
  – 2PCxDS is isomorphic to PC->2DS
  – Any abstraction a : DS -> DA can be
    extended pointwise
           Pointwise Extension
• Given two Galois Connections (GC):
   – a1 : C1 -> A1
   – a2 : C2 -> A2
• the function a3 : C1 x C2 -> A1 x A2 defined below yields a
  GC
   – a3 (x,y) = (a1(x), a2(y))

• Given a Galois Connection:
   – a : C -> A
• we can define the following GCs:
   – a2 : C x C -> A x A
   – a3 : C x C x C -> A x A x A
   – aV : CV -> AV (i.e., aV : (V->C) -> (V-> A) )
   –

								
To top