Docstoc

Pointer Analysis as System of Linear Equations

Document Sample
Pointer Analysis as System of Linear Equations Powered By Docstoc
					Scaling Context-sensitive Points-to Analysis




                     Rupesh Nasre.
           Department of Computer Science and Automation
             Indian Institute of Science, Bangalore, India

             Advisor: Prof. R. Govindarajan

                         Ph.D. Colloquium
                           Dec 10, 2010
What is Pointer Analysis?

a = &x;                         a points to x.

b = a;                       a and b are aliases.

if (b == *p) {       Is this condition always satisfied?

    …
} else {
    …                     Pointer Analysis is a mechanism to statically
}                             find out run-time values of a pointer.


    We focus on C/C++ programs and deal with may points-to analysis.
Why Pointer Analysis?

   For Parallelization.
             fun(p) || fun(q);
   For Optimization.             Clients of
             a = p + 2;          Pointer Analysis.

             b = q + 2;
   For Bug-Finding.
   For Program Understanding.
   ...
  Placement of Pointer Analysis.

       Improved runtime.          Parallelizing compiler.

       Lock synchronizer.                                         Memory leak detector.


                                                                    Secure code.
   Data flow analyzer.           Pointer Analysis.                   String vulnerability finder.


Better compile time.
   Affine expression analyzer.                                       Type analyzer.

                                     Program slicer.
                                                            Better debugging.
Analysis Dimensions.

   Context-sensitivity
   Field-sensitivity
   Flow-sensitivity
Context sensitivity.
   caller1() {                             caller2() {
       fun(&x);                                fun(&y);
   }                                       }


                       fun(int *ptr) {
                           a = ptr;
                       }

          context-insensitive: {a → x, a → y}.
          context-sensitive: {a → x along call-path caller1,
                               a → y along call-path caller2}
Field sensitivity.

        a.f = &x;




field-sensitive: {a.f → x}.
field-insensitive: {a → x}.
Flow sensitivity.
           p = &x;
            p = &y;
          label:
            ...

flow-sensitive: {p → y at label}.
flow-insensitive: {p → x, p → y}.
Unification and Inclusion
                          p = &x;
                          p = q;

Unification: {p → x, q → x}        Inclusion: {p → x}
  Almost linear.                     Cubic time-complexity.
  Efficiently implementable           Typically implemented using
  using Union-Find.                Bitmap/BDD.
We focus on

► Context-sensitive
► Flow-insensitive, Inclusion-based
► Field-insensitive,
    Points-to analysis
    Normalized Input.



p        q
             p = &q   address-of   p   q




p
             p=q      copy         p
         q                             q
    Normalized Input.


p          q
               p = *q   load    p   q




p          q
               *p = q   store   p   q
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference


   Undecidable           NP-Hard                     ???                 P
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference


   Undecidable           NP-Hard                   ???                   P
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference


   Undecidable           NP-Hard                   ???                   P
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference



   Undecidable           NP-Hard                   ???                   P
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference



   Undecidable           NP-Hard                   ???                   P
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference



   Undecidable           NP-Hard                   ???                   P
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference



   Undecidable           NP-Hard                   ???                   P
Complexity of Points-to Analysis
                             Points-to Analysis


With dynamic memory allocation                Without dynamic memory allocation



Flow-sensitive       Flow-insensitive         Strongly typed          Weakly typed




 Flow-sensitive        Flow-insensitive           Flow-sensitive      Flow-insensitive




Two dereferences    Arbitrary dereference     Fixed dereference     Arbitrary dereference



   Undecidable           NP-Hard                   ???                   P
 Related Work
                   Precision
                               Context-sensitive Context-insensitive
                               Landi, Ryder 92,
                               Choi et al. 93,
                               Emami et al. 94,
            Flow-sensitive     Reps et al. 95,       Zheng 98
                               Wilson, Lam 95,
                               Hind et al. 99,
Precision
                               Kahlon 08

                                                     Andersen 94,
                                                     Steensgaard 96,
                               Liang, Harrold 99,    Shapiro, Horwitz 97,
                               Whaley, Lam 04,       Fahndrich et al. 98,
            Flow-insensitive                         Das 00,
                               Zhu, Calman 04,       Rountev, Chandra 00,
                                                     Berndl et al. 03,
                               Lattner et al. 07     Hardekopf, Lin 07,
                                                     Pereira, Berlin 09

            Surveys            Hind, Pioli 00; Qiang Wu 06
Issues with context-sensitivity.
   main() {                f(a) {                g(b) {
       S1: f(&x);              S3: g(a);             ...
       S2: f(&y);              S4: g(z);             ...
   }                       }                     }

                                    main

                       S1                   S2
                                                           Exponential
                                                            number of
                       f                    f
                                                            contexts.
                  S3           S4      S3        S4


              g                 g      g         g


                    Invocation graph.
Issues with context-sensitivity.
Storage requirement increases exponentially.
Along S1-S3-S5-S7, p points to {x1, x3, x5, x7}.

Along S1-S3-S5-S8, p points to {x1, x3, x5, x8}.

Along S1-S3-S6-S7, p points to {x1, x3, x6, x7}.

Along S1-S3-S6-S8, p points to {x1, x3, x6, x8}.

Along S1-S4-S5-S7, p points to {x1, x4, x5, x7}.

Along S1-S4-S5-S8, p points to {x1, x4, x5, x8}.

Along S1-S4-S6-S7, p points to {x1, x4, x6, x7}.

Along S1-S4-S6-S8, p points to {x1, x4, x6, x8}.

Along S2...
Scalability Issues

     Method          Language    KLOC      Benchmarks           Results

 Liang, Harrold 99      C         25     non-SPEC           1—10 sec

                                  687K   Large open
 Whaley, Lam 04        Java                                 20 min
                                bytecode source
                                         Small SPEC 2000,
 Zhu, Calman 04         C         25     Prolangs,          3—10 sec
                                         MediaBench
 Lattner et al. 07      C         200    SPEC 95, 2000      3 sec, precision
                                                            equal to that of
                                                            Andersen's
 Kahlon 08              C         128    Open source        300 sec




             We work with benchmarks upto 480 KLOC.
Our Contributions

   Points-to analysis as a system of linear
    equations
   Prioritizing constraint evaluation
   Sound randomized points-to analysis
   Probabilistic points-to analysis using bloom
    filter
Our Contributions

   Points-to analysis as a system of linear
    equations
   Prioritizing constraint evaluation
   Sound randomized points-to analysis
   Probabilistic points-to analysis using bloom
    filter
Points-to Analysis as a
System of Linear Equations

    Map points-to constraints into linear equations.
    Solve the equations using a standard linear solver.
    Unap the results back as points-to information.
First-cut Approach: Transformations

   p = &q                p=q–1
   p=q                   p=q
   p = *q                p=q+1
   *p = q                p+1=q

Each address-taken variable (&v) would be assigned a unique value.
     First-cut Approach.

  a = &x;             a=x-1   x=r
                                      a points to x.
  p = &a;             p=a-1   a=r-1
  b = *p;             b=p+1   b=r-1
  c = b;              c=b     c=r–1
     Solve                    p=r-2

a, b, c point to x.


  p points to a.
     First-cut Approach.

  a = &x;             a=x-1   x=r
  p = &a;             p=a-1   a=r-1   b points to x.

  b = *p;             b=p+1   b=r-1
  c = b;              c=b     c=r–1
     Solve                    p=r-2

a, b, c point to x.


  p points to a.
     First-cut Approach.

  a = &x;             a=x-1   x=r
  p = &a;             p=a-1   a=r-1
                                      c points to x.
  b = *p;             b=p+1   b=r-1
  c = b;              c=b     c=r–1
     Solve                    p=r-2

a, b, c point to x.


  p points to a.
     First-cut Approach.

  a = &x;             a=x-1   x=r
  p = &a;             p=a-1   a=r-1
  b = *p;             b=p+1   b=r-1
                                      p points to a.
  c = b;              c=b     c=r–1
     Solve                    p=r-2

a, b, c point to x.


  p points to a.
     First-cut Approach.

  a = &x;             a=x-1             x=r         a, b, c point to x.
  p = &a;             p=a-1             a=r-1
                                                      p points to a.
  b = *p;             b=p+1             b=r-1
  c = b;              c=b               c=r–1         p points to b.

     Solve                              p=r-2
                                                      p points to c.

a, b, c point to x.
                              Imprecise analysis.

  p points to a.
Issues with First-cut Approach.

       Dereferencing.           Semantically different.
        a = &x versus *a = x.
                                   a = &x      *a = x



                                  a = x-1      a+1 = x


                                 Mathematically same.
Issues with First-cut Approach.

       Dereferencing.
        a = &x versus *a = x.
       Multiple assignments.
                                 a = &x;   a = x-1;
        a = &x, a = &y;         a = &y;   a = y-1;   No
                                                      solution.
Issues with First-cut Approach.

► Dereferencing.
       a = &x versus *a = x.
► Multiple assignments.
       a = &x, a = &y;
► Cyclic assignments.      a = &a;   a = a-1   No
                                               solution.
       a = &a;
Issues with First-cut Approach.

► Dereferencing.
       a = &x versus *a = x.
► Multiple assignments.
       a = &x, a = &y;
► Cyclic assignments.
       a = &a;
► Symmetry of assignment.
       a = b implies b = a.
Important Ideas.

   Address of a variable as a prime number.
   Points-to set as a multiplication of primes.
   Variable renaming to avoid inconsistency.
Prime-factorization: Transformations

  p = &q                  p = pi * prime(&q)
  p=q                     p = pi * q
  p = *q                  p = pi * (q + 1)
  *p = q                  handled separately

  Each address-taken variable (&q) would be assigned a unique
    prime number.
Points-to Information Lattice.

                  3*5*7*11*…



      3*5*7 3*5*11 3*7*11 5*7*11…
                                               Precision
                                               increases
            15 21 33 35 55 77…

               3 5       7 11…

                     1
   We start with larger primes to avoid composition gap problem.
Algorithm Outline.


do {
  equations = Linearize(constraints);
  solution = LinSolve(equations);
  points-to = Interpret(solution);
  constraints += AddConstraints(store-constraints, points-to);
} while points-to information changes;
  Example.

a = &x;              a = a0*17                      a = 17                   a = 17
p = &a;              p = p0*101                     p = 101                  p = 101
b = *p;              b = b0*(p+1)                   b = 102                  b = 17
c = b;               c = c0*b                       c = 102                  c = 17

           &x = 17                        a0 = 1
           &a = 101                       b0 = 1
                                          c0 = 1
                                          p0 = 1
 102 => 1 + 101 => 1 dereference on 101 => 1 dereference on &a => a => 17.
Solution Properties.

► Integrality.
     o   Only addition and multiplication over integers.
► Feasibility.
     o   No negative weight cycle.
► Uniqueness.
     o   Each variable is defined only once.
Soundness.

If &x = 7, &y = 11 and p points to x and y, then p is a multiple of 77.
         Base: p points to x and y by direct assignment.
         Induction: p points to x and y due to an indirect
          assignment (copy, load, store).
         Prove that all indirect assignments are safe.
         Argument: Multiplication moves the dataflow fact
          upwards in the lattice.
Assumption: No problem due to composition gaps.
               p1 + k1 is not misinterpreted as p2 + k2.
The assumption can be enforced by careful offline selection of
        primes.
    Precision.

If &x = 7, &y = 11 and p is a multiple of 77, then p points to x and y.
•         Argument: Prime factorization is unique.
•         Thus, 77 can be decomposed only as 7*11.
•         Prove that none of the address-of, copy, load, store
          statements add extra primes into the composition.


Assumption: No problem due to composition gaps.
               p1 + k1 is not misinterpreted as p2 + k2.
The assumption can be enforced by careful offline selection of primes.
Evaluation.


Benchmarks: SPEC 2000, httpd, sendmail.

Configuration: Intel Xeon, 2 Ghz clock, 4MB L2 cache, 3GB RAM.

Analysis: Context-sensitive, Flow-insensitive.

Linear solver: IBM ILog®.
Analysis Time (second).

     Benchmark   Andersen   Linear
    gcc            OOM       196
    perlbmk        OOM       101
    vortex         OOM        68
    eon            231       106
    gap            144        89
    parser          55        55
    equake         0.22      0.92
    art            0.17      1.26
    bzip2          0.15      1.62
    average         ---       54
Memory (KB).

    Benchmark   Andersen   Linear
   gcc            OOM      68492
   perlbmk        OOM      29864
   vortex         OOM      18420
   eon           385284    38908
   gap           97863     22784
   parser        121588    14016
   equake         161      12992
   art             42       9756
   bzip2          519      10244
   average         ---     20711
Outline.

   Introduction
   Points-to analysis as a system of linear
    equations
   Prioritized points-to analysis
   Randomized points-to analysis
   Probabilistic points-to analysis using bloom
    filter
   Conclusion
 Points-to Analysis as a Graph Problem

     Each pointer as a node, directed edge p → q indicates points-to set of q is
     a subset of that of p.


Input: set C of points-to constraints


Process address-of constraints
Add edges to constraint graph G using copy constraints
repeat
                                                         Literature focuses here.
  Propagate points-to information in G
  Add edges to G using load and store constraints
until fixpoint
                                                           Our work deals here.
Points-to Analysis as a Graph Problem

*e = c, c = *a, e = d, b = a, *a = p
Initially, a→{a,q,r,s,t}, p→{b,c,d}



                          a {aqrst}
                                         b{}
             qrst { }

                                               c{}

             p {bcd}
                                       d{}

                         e{}
Points-to Analysis as a Graph Problem

*e = c, c = *a, e = d, b = a, *a = p
Initially, a→{a,q,r,s,t}, p→{b,c,d}

                        Iteration 0
                                                      Fixed processing order
                            a {aqrst}
                                          b{}                 e=d
             qrst { }
                                                              b=a
                                                             ---------
                                                c{}
                                                              *e = c
             p {bcd}                                          c = *a
                                        d{}                   *a = p

                           e{}
Points-to Analysis as a Graph Problem

*e = c, c = *a, e = d, b = a, *a = p
Initially, a→{a,q,r,s,t}, p→{b,c,d}

                        Iteration 1
                                                      Fixed processing order
                            a {aqrst}
                                          b {aqrst}           e=d
             qrst { }
                                                              b=a
                                                             ---------
                                               c{}
                                                              *e = c
             p {bcd}                                          c = *a
                                        d{}                   *a = p

                           e{}
Points-to Analysis as a Graph Problem

*e = c, c = *a, e = d, b = a, *a = p
Initially, a→{a,q,r,s,t}, p→{b,c,d}

                          Iteration 2
                                                                   Fixed processing order
                              a {abcdqrst}
                                               b {abcdqrst}                e=d
             qrst {bcd}
                                                                           b=a
                                                                          ---------
                                                    c {abcdqrst}
                                                                           *e = c
             p {bcd}                                                       c = *a
                                             d{}                           *a = p

                             e{}
Points-to Analysis as a Graph Problem

*e = c, c = *a, e = d, b = a, *a = p
Initially, a→{a,q,r,s,t}, p→{b,c,d}

                          Iteration 3
                                                                      Fixed processing order
                              a {abcdqrst}
                                                b {abcdqrst}                  e=d
             qrst {bcd}
                                                                              b=a
                                                                             ---------
                                                       c {abcdqrst}
                                                                              *e = c
             p {bcd}                                                          c = *a
                                             d {bcd}                          *a = p

                             e {bcd}
Points-to Analysis as a Graph Problem

*e = c, c = *a, e = d, b = a, *a = p
Initially, a→{a,q,r,s,t}, p→{b,c,d}

                          Iteration 4
                                                                    Fixed processing order
                              a {abcdqrst}
                                                b {abcdqrst}                e=d
             qrst {bcd}
                                                                            b=a
                                                                           ---------
                                                     c {abcdqrst}
                                                                            *e = c
             p {bcd}                                                        c = *a
                                             d {abcdqrst}                   *a = p

                             e {abcdqrst}
Points-to Analysis as a Graph Problem

*e = c, c = *a, e = d, b = a, *a = p
Initially, a→{a,q,r,s,t}, p→{b,c,d}

                        Iteration 5: fixpoint
                                                                     Fixed processing order
                               a {abcdqrst}
                                                 b {abcdqrst}                e=d
            qrst {abcdqrst}
                                                                             b=a
                                                                            ---------
                                                      c {abcdqrst}
                                                                             *e = c
             p {bcd}                                                         c = *a
                                              d {abcdqrst}                   *a = p

                              e {abcdqrst}
Order of Constraint Evaluation

   Optimal ordering is NP-Hard.
   Reduction from Set-Cover problem.
   The problem is hard even when there are no
    complex constraints.
   Need to depend upon heuristics.

               What would be a good heuristic?
     Prioritized Points-to Analysis

     *e = c, c = *a, e = d, b = a, *a = p                                 Processing order

     Initially, a→{a,q,r,s,t}, p→{b,c,d}                                       *a = p (18)
                                                                                c = *a (8)
                                                                                *e = c (0)
           Andersen: Iteration 1                  Priority: Iteration 1
                  a {aqrst}                               a {abcdqrst}
                                b {aqrst}                                   b {abcdqrst}
qrst { }                                    qrst {bcd}

                                                                                   c {abcdqrst}

p {bcd}                                     p {bcd}
                              d{}                                        d{}

                e{}                                      e{}
      Prioritized Points-to Analysis

      *e = c, c = *a, e = d, b = a, *a = p                                Processing order

      Initially, a→{a,q,r,s,t}, p→{b,c,d}                                      *a = p (6)
                                                                               c = *a (0)
                                                                              *e = c (10)
  Andersen: Iteration 2                             Priority: Iteration 2
              a {abcdqrst}                                   a {abcdqrst}
                               b {abcdqrst}                                    b {abcdqrst}
qrst {bcd}                                    qrst {bcd}

                                                                                    c {abcdqrst}
                               c {abcdqrst}
p {bcd}                                       p {bcd}
                             d{}                                            d {abcdqrst}

             e{}                                           e {abcdqrst}
      Prioritized Points-to Analysis

      *e = c, c = *a, e = d, b = a, *a = p                                  Processing order

      Initially, a→{a,q,r,s,t}, p→{b,c,d}                                         *e = c (20)
                                                                                   *a = p (0)
                                                                                   c = *a (0)
  Andersen: Iteration 3                              Priority: Iteration 3
              a {abcdqrst}                                       a {abcdqrst}
                                b {abcdqrst}                                       b {abcdqrst}
qrst {bcd}                                     qrst {abcdqrst}

                                                                                        c {abcdqrst}
                                c {abcdqrst}
p {bcd}                                        p {bcd}
                             d {bcd}                                            d {abcdqrst}

             e {bcd}                                         e {abcdqrst}
      Prioritized Points-to Analysis

      *e = c, c = *a, e = d, b = a, *a = p                                  Processing order

      Initially, a→{a,q,r,s,t}, p→{b,c,d}                                          *e = c (0)
                                                                                   *a = p (0)
                                                                                   c = *a (0)
  Andersen: Iteration 4                              Priority: fixpoint
              a {abcdqrst}                                       a {abcdqrst}
                                b {abcdqrst}                                       b {abcdqrst}
qrst {bcd}                                     qrst {abcdqrst}

                                                                                        c {abcdqrst}
                                c {abcdqrst}
p {bcd}                                        p {bcd}
                             d {bcd}                                            d {abcdqrst}

             e {bcd}                                         e {abcdqrst}
Salient Features

► Our prioritization framework allows for plugging in a
  priority mechanism.
► Constraints at a priority level are evaluated
  repeatedly resulting in a skewed evaluation, which
  achieves fixpoint faster.
► Prioritized analysis is a general technique and can
  be applied to other analyses.
Evaluation

Benchmarks: SPEC 2000, httpd, sendmail, ghostscript, gdb, wine-server
Framework: LLVM
Effect of Prioritized Points-to Analysis




 The number of iterations required by prioritized analysis to reach fixpoint is less
 than that by original analysis.
 Prioritized analysis adds more facts earlier than the original analysis.
Results

                                            Prioritized
      Benchmark         Andersen
                                            Andersen
     gcc                     329               286

     perlbmk                 143                98

     equake                   24                17

     art                      26                19

     ghostscript            4384               3183

     gdb                    9338               5847
                   Analysis time (second)
     average                 737               495
Results


                                            Prioritized
      Benchmark         BDD LCD
                                            BDD LCD
     gcc                   17411               7984
     perlbmk                5879               3159
     equake                   4                 3
     art                      7                 4
     ghostscript           20612              12371
     wine-server              36
                   Analysis time (second)       23
     average                1468               963
Outline.

   Introduction
   Points-to analysis as a system of linear
    equations
   Prioritized points-to analysis
   Randomized points-to analysis
   Probabilistic points-to analysis using bloom
    filter
   Conclusion
Approach

   Process a set of randomly chosen constraints using
    Unification and the remaining using Inclusion.
   Compose the two results in a sound manner to get an
    approximation to Andersen's analysis.
   Run the analysis a few times to improve the
    approximation.
Approach


           S



               A
Analysis Dimensions

   Inclusion-Unification
   Flow-sensitivity
   Context-sensitivity
   Field-sensitivity
Generic Algorithm
Input: Program P, Output: Points-to information R
for run = 1..N do
  set selection probability                      A sound analysis requires careful
  aggregate = Select(P)                          definitions of Select, Summarize
  summary = Summarize(aggregate)                 and Compose for each analysis
                                                 dimension.
  repeat
     for all entities e in P do
        if e in aggregate then
           R_run = R_run union Compose(process(summary))
        else
           R_run = R_run union process(e)
        end if
     end for
  until fixpoint
end for
R = intersect all R_run
return R
Example: Randomized Flow-sensitivity

a = &x
b = &y
p = &a
c = *p
a = &y


Flow-insensitive: a → {x, y}, b → {y}, p → {a}, c → {x, y}
Example: Randomized Flow-sensitivity


a = &x
b = &y
p = &a
c = *p
a = &y

Flow-insensitive: a → {x, y}, b → {y}, p → {a}, c → {x, y}
Example: Randomized Flow-sensitivity

              Flow Sensitive
a = &x        a → {x}
b = &y        a → {x}, b → {y}
p = &a        a → {x}, b → {y}, p → {a}
c = *p        a → {x}, b → {y}, p → {a}, c → {x}
a = &y        a → {y}, b → {y}, p → {a}, c → {x}

Flow-insensitive: a → {x, y}, b → {y}, p → {a}, c → {x, y}
Example: Randomized Flow-sensitivity

a = &x
b = &y
p = &a
c = *p
a = &y
Example: Randomized Flow-sensitivity

a = &x
b = &y, a = &y
p = &a
c = *p
b = &y, a = &y
Example: Randomized Flow-sensitivity

a = &x              a → {x}
b = &y, a = &y      a → {x, y}, b → {y}, p → {a}, c → {x, y}
p = &a              a → {x, y}, b → {y}, p → {a}
c = *p              a → {x, y}, b → {y}, p → {a}, c → {x, y}
b = &y, a = &y      a → {x, y}, b → {y}, p → {a}, c → {x, y}


Flow-insensitive: a → {x, y}, b → {y}, p → {a}, c → {x, y}


                 Flow-sensitive <= Randomized <= Flow-insensitive
Analysis Dimensions

   Inclusion-Unification
   Flow-sensitivity
   Context-sensitivity
   Field-sensitivity
Soundness

   The points-to information is always a superset of that of the
    more precise analysis.
   We prove it by contradiction that a points-to fact is computed
    by more precise analysis but is not computed by randomized
    analysis.
   The proof relies on careful definitions of Select, Summarize
    and Compose.
   The soundness can be proved for all the analysis dimensions.
Configuration Parameters

Selection probability: 0, 0.1, 0.2, ..., 0.9, 1.0
Number of runs: 1, 2, 4, 8, 10, 16
Configuration: probability x runs, e.g., 0.4x8.


                Number of points-to pairs in most precise analysis
Precision =
                 Number of points-to pairs in randomized analysis

Precision loss = 1 - Precision
Overall Effect of Configurations




       Many configurations that achieve a precision of over
              95% with analysis time below 30%.
Effect of Selection Probability

                             Max

                              75th percentile

                              Avg

                              25th percentile

                              Min
Effect of Number of Runs

                           Max

                           75th percentile

                           Avg

                           25th percentile

                           Min
Outline.

   Introduction
   Points-to analysis as a system of linear
    equations
   Prioritized points-to analysis
   Randomized points-to analysis
   Probabilistic points-to analysis using bloom filter
   Conclusion
Points-to Analysis using Bloom Filter

   We propose a multi-dimensional bloom filter
    (multibloom) to represent points-to information with
    almost no loss in precision.
   Using extended bloom filter operations, we develop a
    scalable approximate points-to analysis.
   We demonstrate the scalability promise using SPEC
    2000 benchmarks.
   We analyze the effect of our points-to analysis using
    Mod/Ref analysis as a client and show that it is almost
    as precise as an exact analysis.
Conclusions and Future Work

   Points-to analysis as a system of linear equations
       o   Optimizations from linear algebra
   Prioritized points-to analysis
       o   Other heuristics
   Randomized points-to analysis
       o   Multiple dimensions
   Probabilistic points-to analysis using bloom filter
       o   Flow-sensitive analysis
Grateful Thanks

   Kaushik Rajan (MSR)
   Uday Khedker (IITB)
   Aditya Thakur (Wisc)
   Arnab De (IISc)
   Atanu Mohanty (IISc)
   K. V. Raghavan (IISc)
   Aditya Kanade (IISc)
   Kapil Vaswani (MSR)
   Akash Lal (MSR)
Scaling Context-sensitive Points-to Analysis




                     Rupesh Nasre.
                       nasre@csa.iisc.ernet.in

           Department of Computer Science and Automation
             Indian Institute of Science, Bangalore, India

             Advisor: Prof. R. Govindarajan

                         Ph.D. Colloquium
                           Dec 10, 2010
Properties.

►   If the value of a pointer p is a prime number, then it defines a
    must-point-to relation, else it is a may-point-to relation.
►   If the value of p is 1, then p is unused.
►   If pointers p1 and p2 have the same value, then p1 and p2
    are pointer equivalent.
►   Variables x and y are location equivalent when &x dividing
    the value of pointer p implies &x*&y also divide the value.
►   Pointers p1 and p2 are aliases if gcd(p1, p2) != 1.
Normalized Input.

  p = &q   address-of


  p=q      copy


  p = *q   load


  *p = q   store
Normalized Input.

  p = &q   address-of   p   q




  p=q      copy


  p = *q   load


  *p = q   store
Normalized Input.

  p = &q   address-of   p   q




  p=q      copy


  p = *q   load


  *p = q   store
Normalized Input.

  p = &q   address-of


  p=q      copy         p   q




  p = *q   load


  *p = q   store
Normalized Input.

  p = &q   address-of


  p=q      copy         p   q




  p = *q   load


  *p = q   store
Normalized Input.

  p = &q   address-of


  p=q      copy


  p = *q   load         p   q



  *p = q   store
Normalized Input.

  p = &q   address-of


  p=q      copy


  p = *q   load         p   q



  *p = q   store
Normalized Input.

  p = &q   address-of


  p=q      copy


  p = *q   load


  *p = q   store
                        p   q
Normalized Input.

  p = &q   address-of


  p=q      copy


  p = *q   load


  *p = q   store
                        p   q
Tackling scalability issues.
How about not storing complete contexts?
How about storing approximate points-to information?
Can we have a probabilistic data structure that approximates
  the storage?
Can we control the false-positive rate?
Bloom filter.
 A bloom filter is a probabilistic data structure for
 membership queries, and is typically implemented as a
 fixed-sized array of bits.


 To store elements e1, e2, e3, bits at positions
 hash(e1), hash(e2) and hash(e3) are set.


           1              1

          e1, e3          e2
Points-to analysis with Bloom
filter.
A constraint is an abstract representation of the
pointer instruction.
►p = &x         p.pointsTo(x).
►p = q          p.copyFrom(q).
►*p = q         p.storeThrough(q).
►p = *q         p.loadFrom(q).

Function arguments and return values resemble
p = q type of statement.
Note, each constraint also stores the context.
Points-to Analysis with Bloom
filter.
 If points-to pairs (p, x) are kept in bloom filter,
 existential queries like “does p point to x?” can be
 answered.
 What about queries like “do p and q alias?”?
 What about context-sensitive queries like “do p and q
 alias in context c?”?
 How to process assignment statements p = q?
 How about load/store statements *p = q and q = *p?
Multi-Bloom filter.
  Points-to pairs are kept in a bloom filter per pointer. A
  bit set to 1 represents a points-to pair.
Example (diagram on the next slide):
Points-to pairs {(p, x), (p, y), (q, x)}.
hash(x) = 5, hash(y) = 6.
Set bit numbers: p.bucket[5], p.bucket[6], q.bucket[5].
Can hash(x) and hash(y) be the same? Yes.
Multi-Bloom filter.
   p.bucket.
    0                                5           6
    0       0   0       0   0        1       1       0   0   0



                            (p, x)               (p,y)


    q.bucket.

        0                                5
            0   0   0       0    0           1       0   0   0



                                (q, x)
Multi-Bloom filter.
 Each pointer has a fixed number of bits for
 storing its points-to information, called as a
 bucket.
 Thus, if bucket size == 10, all pointees are hashed to a value
 from 0 to 9.


 This notion is extended to have multiple buckets for each pointer
 for multiple hash functions.
Handling p = q.
►Points-to set of q should be added to the points-
 to set of p.
►Bitwise-OR each bucket of q with the
 corresponding bucket of p.

 Example on the next slide.
Example.
►h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3.
Handling p = *q and *p = q.
►Extend multi-bloom to have another dimension
 for pointers pointed to by pointers.
►The idea can be extended to higher-level
 pointers (***p, ****p, and so on).
►We implemented it only for two-level pointers.

 Example on the next slide.
Another example.
►h(x) = 1, h(y) = 4, hs(p1) = 1, hs(p2) = 2.
Alias query: context-sensitive.
If the query is DoAlias(p, q, c),
 for each hash function ii {
     hasPointee = false;
     for each bucket-bit jj
       if (p.bucket[c][ii][jj] and q.bucket[c][ii][jj])
          hasPointee = true;
     if (hasPointee == false)
       return NoAlias;
 }
 return MayAlias;
Alias query: context-insensitive.
If the query is DoAlias(p, q),
 for each context c {
     if (DoAlias(p, q, c) == MayAlias)
       return MayAlias;
 }
 return NoAlias;
Experimental evaluation: Time.

                  exact.   (40-10-4)    (80-10-8)   (400-100-12)   (800-100-16)
     gcc          OOM.      791.705    3250.627       10237.702      27291.303
     perlbmk      OOM.       76.277      235.207       2632.044       5429.385
     vortex       OOM.       95.934      296.995       1998.501       4950.321
     eon        231.166      39.138      118.947       1241.602       2639.796
     parser      55.359       9.469       31.166        145.777        353.382
     gap        144.181       5.444       17.469        152.102        419.392
     vpr         29.702       5.104       18.085         88.826        211.065
     crafty      20.469       2.636        9.069         46.899        109.115
     mesa         1.472       1.384        2.632         10.041         23.721
     ammp         1.120       1.008        2.592         15.185         38.018
     twolf        0.596       0.656        1.152          5.132         12.433
     gzip         0.348       0.192        0.372          1.808          4.372
     bzip2        0.148       0.144        0.284          1.348          3.288
     mcf          0.112       0.332        0.820          5.036         12.677
     equake       0.224       0.104        0.236          1.104          2.652
     art          0.168       0.164        0.408          2.404          6.132
     httpd       17.445       7.180       15.277         52.793        127.503
     sendmail     5.956       3.772        6.272         25.346         65.889
Experimental evaluation:
Memory.
                   exact.   (40-10-4)   (80-10-8)   (400-100-12)   (800-100-16)
     gcc            OOM.     3955.39    15444.90      113576.00      302117.00
     perlbmk        OOM.     1880.87     7344.33       54007.70      143662.00
     vortex         OOM.      817.89     3193.65       23485.00       62471.00
     eon        385283.89    1722.32     6725.23       49455.00      131552.00
     parser      121587.5     564.19     2203.01       16200.20       43093.10
     gap         97862.67    1106.97     4322.47       31785.90       84551.70
     vpr          50209.5     309.98     1210.40        8900.88       23676.60
     crafty      15985.59     142.59      556.78        4094.37       10891.20
     mesa         8260.22     720.95     2815.14       20701.60       55066.90
     ammp         5843.27     200.09      781.30        5745.38       15282.90
     twolf        1593.69     440.73     1720.93       12655.20       33663.10
     gzip         1446.47      41.96      163.84        1204.79        3204.79
     bzip2         518.88      30.56      119.31          877.37       2333.82
     mcf           219.57      49.20      192.13        1412.83        3758.17
     equake        160.13      52.00      203.03        1493.03        3971.50
     art            41.46      22.18       86.59          636.77       1693.82
     httpd      225512.89    3058.17    11941.40       87813.10      233586.00
     sendmail   197382.28    1672.88     6532.20       48035.60      127776.00
Experimental evaluation:
Precision.
                exact.   (40-10-4)   (80-10-8)   (400-100-12) (800-100-16)
     gcc         OOM          71.8        79.6           83.4         85.3
     perlbmk     OOM          75.3        85.0           89.3         90.6
     vortex      OOM          85.7        90.1           91.2         91.5
     eon          96.8        81.5        88.9           94.3         96.8
     parser       98.0        65.8        97.3           97.9         98.0
     gap          97.5        88.2        93.5           96.7         97.4
     vpr          94.2        85.9        93.9           94.1         94.2
     crafty       97.6        97.1        97.6           97.6         97.6
     mesa         99.4        89.6        96.6           99.1         99.4
     ammp         99.2        98.4        99.0           99.2         99.2
     twolf        99.3        96.7        99.1           99.3         99.3
     gzip         90.9        88.8        90.5           90.8         90.9
     bzip2        88.0        84.8        88.0           88.0         88.0
     mcf          94.5        91.3        94.3           94.5         94.5
     equake       97.7        96.9        97.7           97.7         97.7
     art          88.6        86.6        88.4           88.6         88.6
     httpd        93.2        90.1        92.1           92.9         93.2
     sendmail     90.4        85.6        88.2           90.3         90.4
Complexity Results
Points-to Analysis
  With dynamic memory allocation
     Flow-sensitive: Undecidable
     Flow-insensitive: ???
  Without dynamic memory allocation
    Strictly typed
      Flow-insensitive: P
      Flow-sensitive
         Two levels of dereferencing: P
         Arbitrary levels of dereferencing: NP-Hard
    Weakly typed
Trading off Flow-sensitivity
Hind, Pioli, Which Pointer Analysis should I Use?, ISSTA 2000



                      The use of flow-sensitive pointer
                    analysis (as described in this paper)
                     does not seem justified because it
                     offers only a minimum increase in
                       precision over the analyses of
                        Andersen and Burke et al. …

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:8/5/2011
language:English
pages:117