Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Infocom 2001 - paper 101 by yurtgc548

VIEWS: 5 PAGES: 77

									Model Fitting




   Jean-Yves Le Boudec

                         0
                 Contents
            1. What is model fitting ?
              2. Linear Regression
3. Linear regression with L1 norm minimization
           4. Choosing a distribution
                  5. Heavy Tail




                                                 1
                       Virus Infection Data
  We would like to capture the growth of
  infected hosts

(explanatory model)




  An exponential model seems
  appropriate




  How can we fit the model, in
  particular, what is the value of  ?



                                              2
Least Square Fit of Virus Infection Data

      = 0.5173

     Mean doubling time 1.34 hours

     Prediction at +6 hours: 100 000 hosts




                       Least square fit




                                             3
Least Square Fit of Virus Infection Data In Log
                     Scale

        = 0.39

       Mean doubling time 1.77 hours

       Prediction at +6 hours: 39 000 hosts




                                              Least square fit




                                                                 4
Compare the Two



          LS fit in natural scale




                                    LS fit in log scale




                                                    5
    Which Fitting Method should I use ?
Which optimization criterion should I use ?

The answer is in a statistical model.
   Model not only the interesting part, but also the noise



For example




                      = 0.5173

                                                             6
                          = 0.39


How can I tell which is correct ?




                                    7
                   Look at Residuals
= validate model




                                       8
9
10
   Least Square Fit = Gaussian iid Noise
Assume model (homoscedasticity)




The theorem says:
     minimize least squares = compute MLE for this model

This is how we computed the estimates for the virus example
                                                              11
                        Least Square and Projection


                                                                                                         Data point




                                                                                                               Predicted response


                                                                              Manifold
                                                                              Where the data point would lie
                                                                              if there would be no noise

                           Estimated parameter

Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example
                                                                                                                                    12
Confidence Intervals




                       13
14
Robustness to « Outliers »




                             15
                 A Simple Example

Least Square               L1 Norm Minimization
  Model: y_i = m + noise     Model : y_i = m + noise

  What is m ?                What is m ?

  Confidence interval ?      Confidence interval ?




                                                       16
Mean Versus Median




                     17
                  2. Linear Regression
Also called « ANOVA » (Analysis of
Variance »)

= least square + linear dependence
on parameter




A special case where computations
are easy


                                         18
Example 4.3




              What is the parameter ?
              Is it a linear model ?
              How many degrees of freedom ?
              What do we assume on i?
              What is the matrix X ?


                                         19
20
        Does this model have full rank ?




Q: Matrix X has full rank means the dimension of the set X() is ????
A: 3
                                                                        21
Some Terminology




          xi are called explanatory variable
              Assumed fixed and known


          yi are called response variables
              They are « the data »
              Assumed to be one sample output of the
              model
                                                       22
Least Square and Projection


                                      Data point




                                            Predicted response


                      Manifold
                      Where the data point would lie
                      if there would be no noise

Estimated parameter


                                                                 23
Solution of the Linear Regression Model




                                          24
          Least Square and Projection
The theorem gives H and K



                                         data
                     residuals




                                                       Predicted response


                                 Manifold
                                 Where the data point would lie
                                 if there would be no noise

           Estimated parameter


                                                                            25
The Theorem Gives  with Confidence Interval




                                               26
                                  SSR
Confidence Intervals use the quantity s




s2 is called « Sum of Squared Residuals »



                                            data
                      residuals




                                                   Predicted response

                                                                        27
Validate the Assumptions with Residuals




                                          28
                          Residuals
Residuals are given by the theorem




                                     data
                     residuals




                                            Predicted response

                                                                 29
                Standardized Residuals
The residuals ei are an estimate of the noise terms i
They are not (exactly) normal iid




The variance of ei is ????
A: 1- Hi,i
Standardized residuals are not exactly normal iid either but their variance is
1


                                                                            30
 Which of these two models could be a linear
             regression model ?




A: both

Linear regression does not mean that yi is a linear function of xi
Achtung: There is a hidden assumption
   Noise is iid gaussian -> homoscedasticity
                                                                     31
32
      3. Linear Regression with L1 norm
                 minimization
= L1 norm minimization + linear
dependency on parameter
More robust
Less traditional




                                          33
This is convex programming




                             34
35
                 Confidence Intervals
No closed form
   Compare to median !



Boostrap:
   How ?




                                        36
37
              4. Choosing a Distribution
Know a catalog of distributions, guess a fit
    Shape
    Kurtosis, Skewness
    Power laws
    Hazard Rate


Fit
Verify the fit visually or with a test (see later)




                                                     38
                    Distribution Shape
Distributions have a shape
   By definition: the shape is what remains the same when we
       Shift
       Rescale




Example: normal distribution: what is the shape parameter ?

Example: exponential distribution: what is the shape parameter ?




                                                                   39
                Standard Distributions
In a given catalog of distributions, we give only the distributions with
different shapes. For each shape, we pick one particular distribution, which
we call standard.

   Standard normal: N(0,1)

   Standard exponential: Exp(1)

   Standard Uniform: U(0,1)




                                                                           40
Log-Normal Distribution




                          41
42
Skewness and Curtosis




                        43
Power Laws and Pareto Distribution




                                     44
Complementary Distribution Functions
         Log-log Scales

               Lognormal   Pareto   Normal




                                             45
Zipf’s Law




             46
47
                            Hazard Rate
Interpretation: probability that a
flow dies in next dt seconds given
still alive




Used to classify distribs
   Aging

   Memoriless

   Fat tail


Ex: normal ? Exponential ? Pareto ?
Log Normal ?
                                          48
              The Weibull Distribution
Standard Weibull CDF

Aging for c > 1
Memoriless for c = 1
Fat tailed for c <1




                                         49
                Fitting A Distribution
Assume iid                      Frequent issues
Use maximum likelihood             Censoring
Ex: assume gaussian; what are      Combinations
parameters ?




                                                  50
                       Censored Data
We want to fit a log normal distrib,     Idea: use the model
but we have only data samples
with values less than some max

Lognormal is fat tailed so we          and estimate F0 and a (truncation
cannot ignore the tail                   threshold)




                                                                           51
52
                        Combinations
We want to fit a log normal distrib
to the body and pareto to the tail

Model:




MLE satisfies




                                       53
54
                          5. Heavy Tails
Recall what fat tail is
Heavier than fat:




                                           55
Heavy Tail means Central Limit does not hold
Central limit theorem:

a sum of n independent random variables with finite second moment tends
to have a normal distribution, when n is large


explains why we can often use normal assumption



But it does not always hold. It does not hold if random variables have infinite
second moment.




                                                                             56
Central Limit Theorem for Heavy Tails




normal qqplot            histogram           complementary d.f.
                                                 log-log
                One Sample of 10000 points
                       Pareto p = 1




                                                                  57
              p=1


1 sample, 10000 points   average of 1000 samples


              p=1.5




              p=2



              p=2.5



              p=3



                                                   58
Convergence for heavy tailed distributions




                                             59
Importance of Second Moment




                              60
               RWP with Heavy Tail
Stationary ?




                                     61
Evidence of Heavy Tail




                         62
                     Testing Heavy Tail
Assume you have very large data set
   Else no statement can be made

   One can look at empirical cdf in log scale




                                                63
                     Taqqu’s method
A better method (numerically safer is as follows).

Aggregate data multiple times




                                                     64
We should have




and




If  ≈ log ( m2 / m1) then measure p =  /
pest = average of all p’s




                                              65
Example


 log ( 2)
            log ( 2) / p




                           66
Evidence of Heavy Tail




p = 1.08 ± 0.1


                         67
              A Load Generator: Surge
Designed to create load for a web server
Used in next lab
Sophisticated load model
It is an example of a benchmark, there are many others – see lecture




                                                                       68
                User Equivalent Model
Idea: find a stochastice model that represents user well
User modelled as sequence of downloads, followed by “think time”
   Tool can implement several “user equivalents”
Used to generate real work over TCP connections




                                                                   69
Characterization of UE




                   Weibull dsitributions




                                           70
Successive file requests are not independent




Q: What would be the distribution if they were independent ?
A: geometric




                                                               71
                  Fitting the distributions




Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram)
    What other method could one use ?
    A: maximum likelihood with numerical optimization – issue is non iid-ness

                                                                                72
73
Review




         74
75
                   Review Question

Infection Data
   We have measured some
   data x(t), t=1,2,3… where
   x(t) is the number of
   infected hosts in a country
   at time t (in hours). We
   plot the data and see the
   following. Propose a
   method to estimate the
   rate at which the infection
   decreases.


                                     76

								
To top