# The Algorithm

Document Sample

```					     “Stable Distributions,
Pseudorandom
Generators, Embeddings
and Data Stream
Computation”
Paper by Piotr Indyk

Presentation by Andy Worms and Arik Chikvashvili

1
Abstract
 Stable Distributions
 Stream computation by combining
the use of stable distributions
 Pseudorandom Generators (PRG)
and their use to reduce memory
usage

2
Introduction
Input: n-dimensional point p with L1
norm given.
Storage: in a sketch C(p) of
O(logn/є ) words
Property: given C(p) and C(q) will
estimate |p-q|1 for points p and q
up to factor (1+є) w.h.p.

3
Introduction.
Stream computation
Given a stream S of data, each chunk
of data is of the form (i,a) where
i[n]={0…n-1} and a{-M…M}.
We want to approximate the quantity
L1(S), where
p
n 1
L p ( S )  (                a )  1/ p

i  0 ( i , a )S

4
Other results
 N. Alon, Y. Matias and M. Szegedy.
Space O(1/є ), w.h.p. for each i in
the stream at most two pairs (i,a),
approximating L2(S) [1996]
 J. Feigenbaum, S. Kannan, M.
Strauss and M. Viswanathan
showed in L1(S) also for at most
two pairs [1999]
5
Stable Distributions
Distribution D over R is p-stable if
exists p≥0 s. t. for any a1…an real
numbers and i. i. d. variables X1 …
Xn with distribution D the variable
a X
i
i   i   has the same distribution

( ai )
p 1/ p
as the variable                    X
i

6
Some News
 The good news: There exist stable
distributions for (0, 2].
 The bad news: Most have no closed
formula (i.e. non-constructive).
 Cauchy Distribution is 1-stable.
 Gauss Distribution is 2-stable.
Source: V.M. Zolotarev “One-dimensional
Stable Distributions” (518.2 ZOL in AAS)

7
Reminder
Cauchy             Gauss

Density
2
arctan( x)


Distribution
2  1              1  x2 / 2
e
 1 x 2
2
8
Tonight’s Program
   “Obvious solution”.
   Algorithm for p=1.
   The algorithm’s
limitations.
   Proof of correctness.
   Overcome the
limitations.
   Time permitting: p =2
and all p in (0, 2].

9
The Obvious Solution
 Hold a counter for each i, and
update it on each pair found in the
input stream.
 Breaks the stream model: O(n)
memory.

10
The Problem (p=1)
   Input: a stream S of pairs (i, a) such
that 0  i  n and –M  a  M.
   Output:
n 1
L1 ( S )                  a
i  0 ( i , a )S

 Up to an error factor of 1±ε
 With probability 1-δ
11
Definitions
l  c /  log1/ 
2
(c defined later)
i    Are independent random variables with
X   j      Cauchy distribution
X1
1    X    1
2       .      X 
1
n
  2                           
X                    X      .           .       . 
i
   1
j 0 j l
0 i  n      .     .           .       . 
 l
X                  l

l 
 1     .       X   n 1   Xn 
A set of Buckets, Sj 0  j  l, initially zero.           12
The Algorithm
   For each new pair (i,a):

j : S  S  aX i
j     j          j

   Return median(|S0|, |S1|, …, |Sl-1|)

13
Limitations
 It assumes infinite precision
arithmetic.
 It randomly and repeatedly accesses
 (n)
random numbers.

14
Example: n = 7, l = 3
i    1    2   3   4   5   6   n=7   Buckets
-5 6     3   -5 -5 3     0        3

2    1   -3 2    0   1   5        2

1    1   1   0   -3 2    1        1

Data Stream
‫להמחשה בלבד‬
(5,-2)
(4,-1)
(2,+1)
15
Correctness Proof
   We define:
ci      
( i , a )S
a   (ci = 0 if there is no (i,a) in S)

   Therefore:

L1 (S )  C  i ci

16
Correctness proof (cont.)
   Claim 1:Each Sj has the same
distribution as CX for some random
variable X with Cauchy Distribution.

   Proof: follows from the 1-stability of
Cauchy Distribution.

17
Correctness Proof (cont.)
   Lemma 1: If X has Cauchy Distribution,
then median(|X|) = 1, median(a|X|) = a.
   Proof: The distribution function of |X| is
z
2  1        2
F ( z)            dx  arctan( z )
0
 1 x 2

   Since tan(/4) = 1, F(1) = 1/2.
   Thus median(|X|)=1 and median(a|X|)=a

18
(Graphics Intermezzo)
2   1           2
y            y       arctan( x)
 1 x 2


19
Correctness proof (cont.)
 Fact:
For any distribution D on R with
distribution function F, take
l  c /  2 log1/ 
independent samples X0, …, Xl-1 of D,
and let X = median(X0, …, Xl-1):

           1     1    
Pr  F ( X )     ,      1  
           2     2                20
Correctness Proof (cont.)
    Fact in simple Hebrew:
1.   You choose an error
(small) and a probability
(high).
2.   With enough samples, you
will discover the median
with high probability within
small error.
21
Correctness Proof (cont.)
 Lemma 2:
Let F be the distribution of |X|, where
X has Cauchy Distribution. And let
z>0 be such that ½–ε  F(z)  ½+ε.
Then, if ε is small enough, 1-4ε  z 
1+4ε
 Proof:

 
F 1 ' 1
( ) 
2                  4      22
Correctness Proof (last)
Therefore we have proved:
The algorithm correctly estimates
L1(S) up to the factor (1±ε), with
probability at least (1-δ).

23
Correctness Proof (review)
For those who are lost:
1. Each Bucket distributes like CX and
median(|CX|)=C.
2. “Enough” samples approximate
median(|CX|)=C, “well enough”.
3. Each bucket is a sample.

24
Tonight’s Program
   “Obvious solution”.
   Algorithm for p=1.
   The algorithm’s
limitations.
   Proof of correctness.
   Overcome the
limitations.
   Time permitting: p =2
and all p in (0, 2].
   God Willing: Uses of the
algorithm.                 25
Bounded Precision
   The numbers of the stream are integers,
the problem is with the random variables.
   We will show it is sufficient to pick them
from the set:
p
VL  { : p, q { L,   , L}, q  0}
q
(In Hebrew: the set of fractions of small numbers)

26
Bounded Precision (cont.)
 We want to generate X r.v. with
Cauchy Distribution.
 We choose Y uniformly from [0,1).
 X = F-1(Y) = tan(Y/2).
 Y is the multiple of 1/L closest to Y.
 X is F-1(Y) rounded to a multiple of
1/L.

27
Cauchy’s Lottery…

28
Bounded Precision (cont.)
 Assume Y < 1-K/L = 1-.
 The derivative of F-1 near Y < 1- is
O(1/2).
 It follows that X=X+E, where
|E|=O(1/2L)=.
S 
j
          aX i   ci X i   ci ( X i   )  S    ci
j          j           j          j

i   (i , a )S           i          i                       i

29
Bounded Precision (cont.)
S j  S j    ci (result from previous slide)
i

median({S }0 j l 1 )   | ci |
j
(up to  & , from the
i         algorithm’s proof)

by making  small enough, you
can ignore the contribution of
  ci
i
30
Memory Usage Reduction
 The naïve implementation uses O(n)
memory words to store the random
matrix.
 Couldn’t we generate the random
matrix on the fly?
 Yes, with a PRG.
 We also toss less coins.

31
Not just for fun.
   From the Python programming
language (www.python.org).

source: http://www.python.org/doc/2.3.3/lib/module-random.html
32
Review: Probabilistic
Algorithms
Allow algorithm A to:
 Use random bits.
input
 Make errors.                        A   output
random bits

for every x, Prr[A(x,r)=P(x)]>1- ε.
(for very small ε, say 10-1000).
33
Exponential time
Derandomization
After 20 years of research we only
have the following trivial theorem.

Theorem: Probabilistic Poly-time
algorithms can be simulated
deterministically in exponential time.
(Time 2poly(n)).

34
Proof:
   Suppose that A                input
uses r random                              A   output
bits.                 random bits

   Run A using all 2r   00000000000
choices for          00000000001
random bits.         00000000010
.           2r
Take the Majority vote        .
of outputs.            11111111111

Time: 2r·poly(n)
35
Algorithms which use few
bits
input
Algorithms with                       A   output
random bits
few random
coins can be           0000
efficiently          0001     2r
.
derandomized!           1111

r=O(log n)
Polynomial time
Time:   2r·poly(n)   deterministic algorithm!

36
 Given a probabilistic algorithm that
uses many random bits.
 Convert it into a probabilistic
algorithm that uses few random
bits.
 Derandomize it by using the
previous Theorem.

37
Pseudorandom Generators
Use a short “seed” of very few truly
random bits to generate a long string
of pseudo-random bits.
input
A     output
seed     PRG     pseudo-random
bits

Pseudo-randomness:
few truly random bits        many “pseudo-random” bits
no efficient algorithm                 input
can distinguish truly                          A     output
random bits
random bits from
pseudo-random bits.
38
Pseudo-Random Generators
New probabilistic algorithm.

input
A     output
short seed      PRG   pseudo-random
bits
few truly random bits
In our algorithm we need to storage only short seed
And not the whole set of pseudorandom bits

39
Remember?
i       1   2   3   4   5   6   n=7
-5 6    3   -5 -5 3     0

2   1   -3 2    0   1   5

1   1   1   0   -3 2    1

   There exist efficient “random access”
(indexable) random number generator.
40
PRG definition
 Given FSM Q
 Given a seed which is really random
 Convert it into a k chunks of random
bits each of length b.
 Formally- G: {0,1}m({0,1}b)k
 Let Q(x) be a state of Q after input x
 G is PRG if
|D[QxDbk(x)] - D[QxDm(G(x))]|1 ≤ є
41
PRG properties
Exists PRG G for space(S) with є=2-O(S)
such that:
 G expands O(SlogR) bits into O(R) bits
 G requires only O(S) bits of storage in
 Any length-O(S) chunk of G(x)n can
be computed using O(logR) arithmetic
operations on O(S)-bit words
42
Randomness reduction
 Consider a fixed Sj and O(log M) place
to hold it
 O(n) for Xi, (i,a) come by increasing
order of i
 So we need O(n) chunks of
randomness
 => exists PRG that needs random seed
of size O(logMlog(n/δ)) to expand it to
n pseudorandom variables X1…Xn
43
Randomness reduction
 X1…Xn variables give us Sj => L1(S)
 But Sj does not depend on order of i-s,
for each I the same Xi will be given =>
input can b unsorted
 We use l=O(log(1/δ))/є random seeds

44
Theorem 2
There is algorithm which estimates L1(S)
up to a factor (1є) with probability 1-
δ and uses (S=logM, R=n/δ)
 O(logMlog(1/δ)/є ) bits of random
access storage
 O(log(n/δ)) arithmetic operations per
pair (i,a)
 O(logMlog(n/δ)log(1/δ)/є ) random
bits
45
Further Results
 When p=2, the algorithm and
analysis are the same, with Cauchy
Distribution replaced by Gaussian.
 For general p in (0, 2] don’t exist
closed formulas for densities or
distribution functions.

46
General p
 Fact: Can be generated p-stable
random variables from two independent
variables that are distributed uniformly
over [0,1] (Chambers, Mallows and
Stuck, 1976)
 Seems that Lemma2 and the algorithm
itself could work for this case also, but
no need to solve them as there are not
known applications with p that differs
from 1 and 2.
47
CX
2 1           2
y            y       arctan( x)
 1 x 2


48
y  sin( x)

y  tan( x)

y  cos( x)

49

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 25 posted: 10/4/2010 language: English pages: 49