hashing-graphs

Document Sample
hashing-graphs Powered By Docstoc
					 Hashing Out Random
      Graphs
Nick Jones
Sean Porter
Erik Weyers
Andy Schieber
Jon Kroening
        Introduction
We will be looking at some
applications of probability in
computer science, hash functions,
and also applications of probability
with random graphs.
     Hash Functions
We are going to map at set of n
records, denoted , r1, r2, … rn, in m,
m > n, locations with only one
record in each location in m.
A hashing function is a function that
maps the record values into the m
locations.
We use a sequence of hash
functions, denoted h1, h2, h3, …, to
map the ri records in the m
locations.
The records are placed sequentially
as indicated below:
  h1(r1) = m1.

  h1(r2), h2(r2), h3(r3), …
Every time we are unsuccessful in
placing a record (because it is
already full), a collision occurs.
We will let the random variable X
denote the number of collisions that
occur when placing n records.
We would like to find E[X] and
Var(X).
These values are very hard to
figure out but we can come up with
a formula for each of these two
problems.
In order to do this we need to
define some other random
variables.
Yk = #of collisions in placing rk
         n
  X   Yk  Y 1  Y 2  ...  Yn
        k 1
  Zk  Yk  1
     (geometric with p = (m-k+1)/m)
Therefore,

  X  Z 1  Z 2  ...  Zn  n
We can then find E[Zk].
          1          m
E[Zk]  
          p m  k 1
E[ X ]  E[ Z 1]  E[ Z 2]  ...  E[ Zn]  n
                 n
        n   E[ Zk ]
                k 1
                n
                      m
        n  
              k 1 m  k  1
         1   1              1 
 n  m         ... 
          m m 1         m  n  1
                                   
             m
 n  m      dx / x
           m  n 1

               m 
  n  m log          
               m  n 1
               m 
EX   m log          n
               m  n 1
We would also like to find Var(X).

 Var Zk  
             1  p      mk  1
                 2
                  p       m k 12




            n
Var  X   Var Zk   m
                               n
                                       k  1
           k 1               k 1   mk 1    2



          1        2              n 1 
       m      2     2  ... 
                                            
          m1 m 2
         
                                 mn1 
                                          2

                                            
       n 1
                 x
   m                2 dx

        1
              m  x 
We now know the formula for E[X]
 and the Var(X).
                 m 
  EX   m log          n
                 m  n 1
                   n 1
  Var  X   m 
                             x
                                  2 dx

                    1
                          m  x 
    Alfred Renyi
March 30, 1921– Feb. 1, 1970
        49 years old
The Hungarian mathematician
spent six months in hiding after
being forced into a Fascist Labor
Camp in 1944
During that time he rescued his
parents from a Budapest prison by
dressing up in a soldiers uniform
He got his Ph.D. at the University of
Szeged in Hungary
Renyi worked with Erdös on
Random Graphs, they published
joint work
He worked on number theory and
graph theory, which led him to
results about the measures of the
dependency of random variables
         Paul Erdös




“A Mathematician is a machine for
   turning coffee into theorems”
    Born: March 26, 1913


May have been the most prolific
  mathematician of all time


Written and Co-Authored over
         1475 Papers
 Erdös was born to two high school
           math teachers
 His mother kept him out of school
  until his teen years because she
          feared its influence
At home he did mental arithmetic and
  at three he could multiply numbers
              in his head
Fortified by espresso Erdös did math
 for 19 hours a day, 7 days a week
He devoted his life to a single narrow
  mission: uncovering mathematical
                  truth
 He traveled around for six decades
      with a suit case looking for
   mathematicians to pick his brain
          His motto was:
    “Another roof, another proof”
“Property is a nuisance”
“Erdös posed and solved thorny
problems in number theory and
other areas and founded the field of
discrete mathematics which is a
foundation of computer science”
Awarded his doctorate in 1934 at
the University of Pazmany Peter in
Budapest
              Graphs
A graph consists of a set of
elements V called vertices and a
set E of pairs of vertices called
edges
A path is a set of vertices i,i1,i2,..,ik,j
for which (i,i1),(i1,i2),..,(ik,j) Є E is
called a path from i to j
   Connected Graphs
A graph is said to be connected if
there is a path between each pair
of vertices
If a graph is not connected it is
called disconnected
     Random Graphs
In a random graph, we start with a
set of vertices and put in edges at
random, thus creating paths
So an interesting question is to find
P(graph is connected) such that
there is a path to every vertex in
the set
James Stirling
Who is James Stirling?
Lived 1692 – 1770.
Family is Roman Catholic in a Protestant
England.
Family supported Jacobite Cause.
Matriculated at Balliol College Oxford
Believed to have studied and matriculated at
two other universities but this is not certain.
Did not graduate because h refused to take
an oath because of his Jacobite beliefs.
Spent years studying, traveling, and making
friends with people such as Sir Isaac Newton
and Nicolaus(I) Bernoulli.
Methodus Differentialis
Stirling became a teacher in London.
There he wrote the book Methodus
Differentialis in 1730.
The book‟s purpose is to speed up the
convergence of a series.
Stirling‟s Formula is recorded in this
book in Example 2 of Proposition 28.


      n! 2nn e        n n
    Stirling’s Formula

      n! 2nn e        n n

Used to approximate n!
Is an Asymptotic Expansion.
Does not converge.
Can be used to approximate a lower
bound in a series.
Percentile error is extremely low.
The bigger the number inserted, the
lower the percentile error.
Stirling’s Formula Error
       Probability
About 8.00% wrong for 1!
About 0.80% wrong for 10!
About 0.08% wrong for 100!
Etc…
                              1
Percentile Error is close to 12 nso if the
                            1
formula is multiplied by 12n , it only
                         1
gets better with errors only at 12 .
                                    n
Probability Background
Normal Distribution and Central
Limit theorem
Poisson Distribution
Multinomial Distribution
The Normal Distribution
A continuous random variable x
with pdf
             (x μ)
                    2

        1
 f(x)      e        2σ
                       2


        2πσ
    x   is called normal
Normal Distribution

It can be shown that X~N(μ,  )
  Normal Distribution
Note: When the mean = 0 and
standard deviation = 1, we get the
standard normal random variable
Z~N(0,1)
Central Limit Theorem
 If X1, X2,… are independent identically
distributed with common mean µ, and
standard deviation σ, then

       n           
         xi   n             y
                                        2

        i 1        1     x
lim P  n  x  2          e
                              
                                            2
                                                dy
 n                 
      
                     
                      
Central Limit Theorem
          n
If Sn   x i , n is large then,
         i 1

Sn is approximat ely normal

 If X ~ N( ,  ) then,
     x μ
 Z       ~ N(0,1)
       σ
Poisson Distribution
             n x
p(x)  lim   p (1 p)
                       nx
             x
        n   
              λ
         λe
          x

p(x)              , x  0 ,1,2...
          x!
Mean and variance both equal to λ
Multinomial Distribution
 n independent identical trials of
events A1, A2,…,Ak with
probabilities P1,P2,...Pk
 Define Xi = number times Ai occurs
j=1…k
(X1+X2+…+Xk = n) then,
Multinomial Distribution

  Px1  n1 , x 2  n 2 ,... x k  n k
        n!
  
                      n1 n2     nk
                     P1 P2 ... Pk
    n1!n 2!... n k !
 Where n is sum of ni
   Connected Graphs
Recall: A random graph G consists of
vertices, V={1,2,…,n}, random
variables x(i) where i=1,..,n along with
probabilities
Pj ( Pj  1)  P{x(i )  j}  Pj
  Connected Graphs
The set of random edges is then
   E  {( i, x(i)) : i  1,.., n}
 which is the edge emanating from
vertex i
   Connected Graphs
The probability that a random graph is
connected P {graph is connected} = ?
A special case: suppose vertex 1 is
„dead‟ (doesn‟t spawn an edge)
N=2               P2
            P1
                       P + P2 = 1
                        1

P{graph connected }  P1
          Dead Vertex Lemma
     Consider a random graph consisting of
                                         i ,
     vertices 0,1,2,..,r and edges (i , Y )
     i=1,2,…,r where Yi are independent
     and P{Yi  j}  Q j , j=0,1,..,r
    n
if ( Q j  1) then P{graph connected } = Q0
   j 0
Dead Vertex Lemma

 1       4
                 6

 3           5

     2
   Maximal Non-Self
  Intersecting(MNSI)
Consider the maximal non-self
intersecting path emanating from
vertex 1:
                                   k 1
1, x(1), x (1),..., x (1)  x( x
               2    k
                                          (1))
           1            2

       3
                            k=3

           5            4
    Maximal Non-Self
   Intersecting(MNSI)
Define
                                      k 1
N  min( k : X (1) {1, X (1),.., X
               k
                                             (1)})
and set
                     N 1
    W  P   Pxi (1)
         1
                     i 1
         Maximal Non-Self
        Intersecting(MNSI)
                      2           7
        1
                                         3

            6
                      4
                                  5
                k=4

    By using the MNSI path as the Dead
    Vertex Lemma,
                                         N 1
P{graph connected | N ,1, X (1),..., X          (1)}  W
   Conditional Probability

The idea of conditiona l probabilit y :
P{event}   P{event | scenario}  P{scenario}
Expectatio ns of discrete random variables
are conditiona l probabilit y averages.
E ( X )   x  P{ X  x}
          x
      Conditional Probability
Taking expectatio ns :
                  event                         scenario
                                                           N 1
             P{graph connected | N ,1, X (1),.., X                (1)}
E (W )                              N 1
               P{N ,1, X (1),.., X          (1)}
                      scenario
       P{graph connected}
  Conditional Probability
Special Case of Interest:
     1
Pj     ( equiprobable vertices)
     n
     N                 1
W            E [W ]     E[ N ]
     n                 n
           n 1
E[ N ]     P{ N
           i 0
                     i}
  Conditional Probability
                 n 1
         1
E [W ] 
         n
                 P{ N
                 i 0
                             i}

  1   n 1
             ( n  1)( n  2)...( n  i )

  n
      
      i 1                n!
  1              ( n  1)!

  n
            n ( n  i  1)!
               i
  Conditional Probability
  ( n  1)!   n 1
                        1
           
             i  0 n (n  i  1
                    i
      n                        )!
                      n  i 1
  ( n  1)n   n 1
                    n
           
             i 0 (n  i  1
        n
      n                     )!
Let    j  n i 1
         ( n  1)!    nj
E [W ]       n    
            n         j!
    Poisson Distribution

Suppose X is Poisson wi th mean   n
            
              k
P{ X  k}  e
           k!
             k
            n n
           e
            k!
Poisson Distribution
So pick
              n 1
P{ X  n}     P{ X
              k 0
                               k}
              n 1
                     n k n
             
              k 0   k!
                        e
                     n 1     j
                            n
          e   n
                     
                     j 0    j!
  Central Limit Theorem

Recall : X  X 1  X 2  ...  X n each Poisson of mean n
By the Central Limit Thm, for large n
S n  N (n, n )
           1
P( X  n)  (asymptotic)
           2
      nj 1       n j en
e n     
       j! 2       j! 2
Conditional Probability
Recall : Stirling' s Formula
n! 2n n n e  n
(n  1)!  2 (n  1) (n  1) n 1 e ( n 1)
               (n  1)!    nj
Recall E[W ]      n
                        
                 n          j!
                            2 (n  1) (n  1) n 1 e ( n 1) e n
So by substituti on E[W ] 
                                       n2  2
Conditional Probability
               n 1
    2 (n  1) 2 e

         2n 2
     2 (n  1) n           n 1
                  (n  1) 2  e
     2       nn
    2                      e
        ((n  1) n) 
                      n

    2                      n 1
    2
        (1  (1 n)) n  e
    2                                  x n
                              lim (1  )  e x
           n 1                n     n
    Conditional Probability
  2 1
       e
                   2          2     1 2
 2 e                      =        = 
    n 1         2 n 1       2 n     2  n

                                
P{graph is connected}  E[W ] 
                                2n
           Thank You
“The first sign of senility is when a
  man forgets his theorems. The
  second is when he forgets to zip
  up. The third is when he forgets to
  zip down.”
                             --Paul Erdös
            References
http://www-history.mcs.st-
andrews.ac.uk/Mathematicians/Erdos.html
http://www-history.mcs.st-
andrews.ac.uk/Mathematicians/Renyi.html
http://www.lassp.cornell.edu/sethna/Cracks/Stirling.ht
ml
http://www-gap.dcs.st-
and.ac.uk/~history/Mathematicians/Stirling.html

				
DOCUMENT INFO