The Algorithm

Document Sample
The Algorithm Powered By Docstoc
					     “Stable Distributions,
        Pseudorandom
    Generators, Embeddings
       and Data Stream
         Computation”
                Paper by Piotr Indyk

Presentation by Andy Worms and Arik Chikvashvili



                                             1
    Abstract
 Stable Distributions
 Stream computation by combining
  the use of stable distributions
 Pseudorandom Generators (PRG)
  and their use to reduce memory
  usage



                                    2
 Introduction
Input: n-dimensional point p with L1
  norm given.
Storage: in a sketch C(p) of
  O(logn/є ) words
Property: given C(p) and C(q) will
  estimate |p-q|1 for points p and q
  up to factor (1+є) w.h.p.

                                       3
 Introduction.
 Stream computation
Given a stream S of data, each chunk
 of data is of the form (i,a) where
 i[n]={0…n-1} and a{-M…M}.
We want to approximate the quantity
 L1(S), where
                                      p
                 n 1
      L p ( S )  (                a )  1/ p

                 i  0 ( i , a )S


                                                 4
    Other results
 N. Alon, Y. Matias and M. Szegedy.
  Space O(1/є ), w.h.p. for each i in
  the stream at most two pairs (i,a),
  approximating L2(S) [1996]
 J. Feigenbaum, S. Kannan, M.
  Strauss and M. Viswanathan
  showed in L1(S) also for at most
  two pairs [1999]
                                        5
 Stable Distributions
Distribution D over R is p-stable if
 exists p≥0 s. t. for any a1…an real
 numbers and i. i. d. variables X1 …
 Xn with distribution D the variable
 a X
  i
       i   i   has the same distribution

                          ( ai )
                                p 1/ p
      as the variable                    X
                            i


                                             6
Some News
 The good news: There exist stable
  distributions for (0, 2].
 The bad news: Most have no closed
  formula (i.e. non-constructive).
 Cauchy Distribution is 1-stable.
 Gauss Distribution is 2-stable.
Source: V.M. Zolotarev “One-dimensional
  Stable Distributions” (518.2 ZOL in AAS)

                                             7
   Reminder
               Cauchy             Gauss

Density
                 2
                     arctan( x)
                

Distribution
                  2  1              1  x2 / 2
                                       e
                  1 x 2
                                    2
                                             8
Tonight’s Program
   “Obvious solution”.
   Algorithm for p=1.
   The algorithm’s
    limitations.
   Proof of correctness.
   Overcome the
    limitations.
   Time permitting: p =2
    and all p in (0, 2].

                            9
The Obvious Solution
 Hold a counter for each i, and
  update it on each pair found in the
  input stream.
 Breaks the stream model: O(n)
  memory.




                                        10
The Problem (p=1)
   Input: a stream S of pairs (i, a) such
    that 0  i  n and –M  a  M.
   Output:
                   n 1
          L1 ( S )                  a
                   i  0 ( i , a )S

 Up to an error factor of 1±ε
 With probability 1-δ
                                           11
Definitions
 l  c /  log1/ 
             2
                         (c defined later)
     i    Are independent random variables with
 X   j      Cauchy distribution
                        X1
                          1    X    1
                                    2       .      X 
                                                     1
                                                     n
                          2                           
 X                    X      .           .       . 
         i
                         1
         j 0 j l
           0 i  n      .     .           .       . 
                         l
                        X                  l
                                                       
                                                     l 
                         1     .       X   n 1   Xn 
 A set of Buckets, Sj 0  j  l, initially zero.           12
The Algorithm
   For each new pair (i,a):

        j : S  S  aX i
              j     j          j




   Return median(|S0|, |S1|, …, |Sl-1|)



                                       13
Limitations
 It assumes infinite precision
  arithmetic.
 It randomly and repeatedly accesses
                (n)
random numbers.




                                   14
      Example: n = 7, l = 3
  i    1    2   3   4   5   6   n=7   Buckets
       -5 6     3   -5 -5 3     0        3




       2    1   -3 2    0   1   5        2




       1    1   1   0   -3 2    1        1




Data Stream
                                      ‫להמחשה בלבד‬
   (5,-2)
   (4,-1)
  (2,+1)
                                                    15
Correctness Proof
   We define:
     ci      
            ( i , a )S
                          a   (ci = 0 if there is no (i,a) in S)

   Therefore:

      L1 (S )  C  i ci


                                                              16
Correctness proof (cont.)
   Claim 1:Each Sj has the same
    distribution as CX for some random
    variable X with Cauchy Distribution.

   Proof: follows from the 1-stability of
    Cauchy Distribution.



                                         17
Correctness Proof (cont.)
   Lemma 1: If X has Cauchy Distribution,
    then median(|X|) = 1, median(a|X|) = a.
   Proof: The distribution function of |X| is
            z
                2  1        2
    F ( z)            dx  arctan( z )
             0
                1 x 2
                            
   Since tan(/4) = 1, F(1) = 1/2.
   Thus median(|X|)=1 and median(a|X|)=a

                                            18
(Graphics Intermezzo)
   2   1           2
y            y       arctan( x)
    1 x 2
                   




                                    19
Correctness proof (cont.)
 Fact:
For any distribution D on R with
  distribution function F, take
            l  c /  2 log1/ 
independent samples X0, …, Xl-1 of D,
  and let X = median(X0, …, Xl-1):

                  1     1    
    Pr  F ( X )     ,      1  
                  2     2                20
Correctness Proof (cont.)
    Fact in simple Hebrew:
1.   You choose an error
     (small) and a probability
     (high).
2.   With enough samples, you
     will discover the median
     with high probability within
     small error.
                                    21
Correctness Proof (cont.)
 Lemma 2:
Let F be the distribution of |X|, where
  X has Cauchy Distribution. And let
  z>0 be such that ½–ε  F(z)  ½+ε.
Then, if ε is small enough, 1-4ε  z 
  1+4ε
 Proof:

     
    F 1 ' 1
          ( ) 
           2                  4      22
Correctness Proof (last)
Therefore we have proved:
The algorithm correctly estimates
 L1(S) up to the factor (1±ε), with
 probability at least (1-δ).




                                      23
Correctness Proof (review)
For those who are lost:
1. Each Bucket distributes like CX and
   median(|CX|)=C.
2. “Enough” samples approximate
   median(|CX|)=C, “well enough”.
3. Each bucket is a sample.




                                    24
Tonight’s Program
   “Obvious solution”.
   Algorithm for p=1.
   The algorithm’s
    limitations.
   Proof of correctness.
   Overcome the
    limitations.
   Time permitting: p =2
    and all p in (0, 2].
   God Willing: Uses of the
    algorithm.                 25
Bounded Precision
   The numbers of the stream are integers,
    the problem is with the random variables.
   We will show it is sufficient to pick them
    from the set:
                p
          VL  { : p, q { L,   , L}, q  0}
                q
(In Hebrew: the set of fractions of small numbers)



                                                26
Bounded Precision (cont.)
 We want to generate X r.v. with
  Cauchy Distribution.
 We choose Y uniformly from [0,1).
 X = F-1(Y) = tan(Y/2).
 Y is the multiple of 1/L closest to Y.
 X is F-1(Y) rounded to a multiple of
  1/L.

                                       27
Cauchy’s Lottery…




                    28
    Bounded Precision (cont.)
     Assume Y < 1-K/L = 1-.
     The derivative of F-1 near Y < 1- is
      O(1/2).
     It follows that X=X+E, where
      |E|=O(1/2L)=.
S 
j
                     aX i   ci X i   ci ( X i   )  S    ci
                         j          j           j          j

     i   (i , a )S           i          i                       i



                                                                     29
Bounded Precision (cont.)
S j  S j    ci (result from previous slide)
                 i


median({S }0 j l 1 )   | ci |
             j
                                     (up to  & , from the
                           i         algorithm’s proof)


                     by making  small enough, you
                      can ignore the contribution of
                                        ci
                                        i
                                                         30
Memory Usage Reduction
 The naïve implementation uses O(n)
  memory words to store the random
  matrix.
 Couldn’t we generate the random
  matrix on the fly?
 Yes, with a PRG.
 We also toss less coins.


                                  31
   Not just for fun.
       From the Python programming
        language (www.python.org).




source: http://www.python.org/doc/2.3.3/lib/module-random.html
                                                           32
  Review: Probabilistic
  Algorithms
Allow algorithm A to:
 Use random bits.
                              input
 Make errors.                        A   output
                        random bits


Answers correctly with high probability.
    for every x, Prr[A(x,r)=P(x)]>1- ε.
     (for very small ε, say 10-1000).
                                                   33
Exponential time
Derandomization
After 20 years of research we only
 have the following trivial theorem.

Theorem: Probabilistic Poly-time
 algorithms can be simulated
 deterministically in exponential time.
 (Time 2poly(n)).

                                       34
Proof:
   Suppose that A                input
    uses r random                              A   output
    bits.                 random bits

   Run A using all 2r   00000000000
    choices for          00000000001
    random bits.         00000000010
                              .           2r
Take the Majority vote        .
  of outputs.            11111111111

Time: 2r·poly(n)
                                                        35
Algorithms which use few
bits
                            input
   Algorithms with                       A   output
                      random bits
     few random
    coins can be           0000
      efficiently          0001     2r
                             .
   derandomized!           1111

                            r=O(log n)
                         Polynomial time
Time:   2r·poly(n)   deterministic algorithm!

                                                  36
Derandomization paradigm
 Given a probabilistic algorithm that
  uses many random bits.
 Convert it into a probabilistic
  algorithm that uses few random
  bits.
 Derandomize it by using the
  previous Theorem.

                                         37
 Pseudorandom Generators
   Use a short “seed” of very few truly
   random bits to generate a long string
   of pseudo-random bits.
                                       input
                                               A     output
           seed     PRG     pseudo-random
                                bits

 Pseudo-randomness:
few truly random bits        many “pseudo-random” bits
no efficient algorithm                 input
can distinguish truly                          A     output
                               random bits
random bits from
pseudo-random bits.
                                                          38
   Pseudo-Random Generators
      New probabilistic algorithm.


               input
                                                A     output
          short seed      PRG   pseudo-random
                                     bits
  few truly random bits
In our algorithm we need to storage only short seed
And not the whole set of pseudorandom bits

                                                           39
    Remember?
i       1   2   3   4   5   6   n=7
        -5 6    3   -5 -5 3     0

        2   1   -3 2    0   1   5

        1   1   1   0   -3 2    1


       There exist efficient “random access”
        (indexable) random number generator.
                                                40
    PRG definition
 Given FSM Q
 Given a seed which is really random
 Convert it into a k chunks of random
  bits each of length b.
 Formally- G: {0,1}m({0,1}b)k
 Let Q(x) be a state of Q after input x
 G is PRG if
|D[QxDbk(x)] - D[QxDm(G(x))]|1 ≤ є
                                           41
   PRG properties
Exists PRG G for space(S) with є=2-O(S)
  such that:
 G expands O(SlogR) bits into O(R) bits
 G requires only O(S) bits of storage in
  addition to its random bits
 Any length-O(S) chunk of G(x)n can
  be computed using O(logR) arithmetic
  operations on O(S)-bit words
                                        42
    Randomness reduction
 Consider a fixed Sj and O(log M) place
  to hold it
 O(n) for Xi, (i,a) come by increasing
  order of i
 So we need O(n) chunks of
  randomness
 => exists PRG that needs random seed
  of size O(logMlog(n/δ)) to expand it to
  n pseudorandom variables X1…Xn
                                       43
    Randomness reduction
 X1…Xn variables give us Sj => L1(S)
 But Sj does not depend on order of i-s,
  for each I the same Xi will be given =>
  input can b unsorted
 We use l=O(log(1/δ))/є random seeds




                                       44
  Theorem 2
There is algorithm which estimates L1(S)
  up to a factor (1є) with probability 1-
  δ and uses (S=logM, R=n/δ)
 O(logMlog(1/δ)/є ) bits of random
  access storage
 O(log(n/δ)) arithmetic operations per
  pair (i,a)
 O(logMlog(n/δ)log(1/δ)/є ) random
  bits
                                         45
Further Results
 When p=2, the algorithm and
  analysis are the same, with Cauchy
  Distribution replaced by Gaussian.
 For general p in (0, 2] don’t exist
  closed formulas for densities or
  distribution functions.



                                    46
    General p
 Fact: Can be generated p-stable
  random variables from two independent
  variables that are distributed uniformly
  over [0,1] (Chambers, Mallows and
  Stuck, 1976)
 Seems that Lemma2 and the algorithm
  itself could work for this case also, but
  no need to solve them as there are not
  known applications with p that differs
  from 1 and 2.
                                         47
CX
     2 1           2
y            y       arctan( x)
    1 x 2
                   




                                    48
  y  sin( x)


                y  tan( x)

y  cos( x)



                         49

				
DOCUMENT INFO