Your Federal Quarterly Tax Payments are due April 15th

# Infocom 2001 - paper 101 by yurtgc548

VIEWS: 5 PAGES: 77

• pg 1
```									Model Fitting

Jean-Yves Le Boudec

0
Contents
1. What is model fitting ?
2. Linear Regression
3. Linear regression with L1 norm minimization
4. Choosing a distribution
5. Heavy Tail

1
Virus Infection Data
We would like to capture the growth of
infected hosts

(explanatory model)

An exponential model seems
appropriate

How can we fit the model, in
particular, what is the value of  ?

2
Least Square Fit of Virus Infection Data

 = 0.5173

Mean doubling time 1.34 hours

Prediction at +6 hours: 100 000 hosts

Least square fit

3
Least Square Fit of Virus Infection Data In Log
Scale

 = 0.39

Mean doubling time 1.77 hours

Prediction at +6 hours: 39 000 hosts

Least square fit

4
Compare the Two

LS fit in natural scale

LS fit in log scale

5
Which Fitting Method should I use ?
Which optimization criterion should I use ?

The answer is in a statistical model.
Model not only the interesting part, but also the noise

For example

 = 0.5173

6
 = 0.39

How can I tell which is correct ?

7
Look at Residuals
= validate model

8
9
10
Least Square Fit = Gaussian iid Noise
Assume model (homoscedasticity)

The theorem says:
minimize least squares = compute MLE for this model

This is how we computed the estimates for the virus example
11
Least Square and Projection

Data point

Predicted response

Manifold
Where the data point would lie
if there would be no noise

Estimated parameter

Skrivañ war an daol petra zo: data point, predicted response and estimated parameter for virus example
12
Confidence Intervals

13
14
Robustness to « Outliers »

15
A Simple Example

Least Square               L1 Norm Minimization
Model: y_i = m + noise     Model : y_i = m + noise

What is m ?                What is m ?

Confidence interval ?      Confidence interval ?

16
Mean Versus Median

17
2. Linear Regression
Also called « ANOVA » (Analysis of
Variance »)

= least square + linear dependence
on parameter

A special case where computations
are easy

18
Example 4.3

What is the parameter ?
Is it a linear model ?
How many degrees of freedom ?
What do we assume on i?
What is the matrix X ?

19
20
Does this model have full rank ?

Q: Matrix X has full rank means the dimension of the set X() is ????
A: 3
21
Some Terminology

xi are called explanatory variable
Assumed fixed and known

yi are called response variables
They are « the data »
Assumed to be one sample output of the
model
22
Least Square and Projection

Data point

Predicted response

Manifold
Where the data point would lie
if there would be no noise

Estimated parameter

23
Solution of the Linear Regression Model

24
Least Square and Projection
The theorem gives H and K

data
residuals

Predicted response

Manifold
Where the data point would lie
if there would be no noise

Estimated parameter

25
The Theorem Gives  with Confidence Interval

26
SSR
Confidence Intervals use the quantity s

s2 is called « Sum of Squared Residuals »

data
residuals

Predicted response

27
Validate the Assumptions with Residuals

28
Residuals
Residuals are given by the theorem

data
residuals

Predicted response

29
Standardized Residuals
The residuals ei are an estimate of the noise terms i
They are not (exactly) normal iid

The variance of ei is ????
A: 1- Hi,i
Standardized residuals are not exactly normal iid either but their variance is
1

30
Which of these two models could be a linear
regression model ?

A: both

Linear regression does not mean that yi is a linear function of xi
Achtung: There is a hidden assumption
Noise is iid gaussian -> homoscedasticity
31
32
3. Linear Regression with L1 norm
minimization
= L1 norm minimization + linear
dependency on parameter
More robust

33
This is convex programming

34
35
Confidence Intervals
No closed form
Compare to median !

Boostrap:
How ?

36
37
4. Choosing a Distribution
Know a catalog of distributions, guess a fit
Shape
Kurtosis, Skewness
Power laws
Hazard Rate

Fit
Verify the fit visually or with a test (see later)

38
Distribution Shape
Distributions have a shape
By definition: the shape is what remains the same when we
Shift
Rescale

Example: normal distribution: what is the shape parameter ?

Example: exponential distribution: what is the shape parameter ?

39
Standard Distributions
In a given catalog of distributions, we give only the distributions with
different shapes. For each shape, we pick one particular distribution, which
we call standard.

Standard normal: N(0,1)

Standard exponential: Exp(1)

Standard Uniform: U(0,1)

40
Log-Normal Distribution

41
42
Skewness and Curtosis

43
Power Laws and Pareto Distribution

44
Complementary Distribution Functions
Log-log Scales

Lognormal   Pareto   Normal

45
Zipf’s Law

46
47
Hazard Rate
Interpretation: probability that a
flow dies in next dt seconds given
still alive

Used to classify distribs
Aging

Memoriless

Fat tail

Ex: normal ? Exponential ? Pareto ?
Log Normal ?
48
The Weibull Distribution
Standard Weibull CDF

Aging for c > 1
Memoriless for c = 1
Fat tailed for c <1

49
Fitting A Distribution
Assume iid                      Frequent issues
Use maximum likelihood             Censoring
Ex: assume gaussian; what are      Combinations
parameters ?

50
Censored Data
We want to fit a log normal distrib,     Idea: use the model
but we have only data samples
with values less than some max

Lognormal is fat tailed so we          and estimate F0 and a (truncation
cannot ignore the tail                   threshold)

51
52
Combinations
We want to fit a log normal distrib
to the body and pareto to the tail

Model:

MLE satisfies

53
54
5. Heavy Tails
Recall what fat tail is
Heavier than fat:

55
Heavy Tail means Central Limit does not hold
Central limit theorem:

a sum of n independent random variables with finite second moment tends
to have a normal distribution, when n is large

explains why we can often use normal assumption

But it does not always hold. It does not hold if random variables have infinite
second moment.

56
Central Limit Theorem for Heavy Tails

normal qqplot            histogram           complementary d.f.
log-log
One Sample of 10000 points
Pareto p = 1

57
p=1

1 sample, 10000 points   average of 1000 samples

p=1.5

p=2

p=2.5

p=3

58
Convergence for heavy tailed distributions

59
Importance of Second Moment

60
RWP with Heavy Tail
Stationary ?

61
Evidence of Heavy Tail

62
Testing Heavy Tail
Assume you have very large data set
Else no statement can be made

One can look at empirical cdf in log scale

63
Taqqu’s method
A better method (numerically safer is as follows).

Aggregate data multiple times

64
We should have

and

If  ≈ log ( m2 / m1) then measure p =  /
pest = average of all p’s

65
Example

log ( 2)
log ( 2) / p

66
Evidence of Heavy Tail

p = 1.08 ± 0.1

67
Designed to create load for a web server
Used in next lab
It is an example of a benchmark, there are many others – see lecture

68
User Equivalent Model
Idea: find a stochastice model that represents user well
Tool can implement several “user equivalents”
Used to generate real work over TCP connections

69
Characterization of UE

Weibull dsitributions

70
Successive file requests are not independent

Q: What would be the distribution if they were independent ?
A: geometric

71
Fitting the distributions

Done by Surge authors with aest tool + ad-hoc (least quare fit of histogram)
What other method could one use ?
A: maximum likelihood with numerical optimization – issue is non iid-ness

72
73
Review

74
75
Review Question

Infection Data
We have measured some
data x(t), t=1,2,3… where
x(t) is the number of
infected hosts in a country
at time t (in hours). We
plot the data and see the
following. Propose a
method to estimate the
rate at which the infection
decreases.

76

```
To top