VIEWS: 6 PAGES: 29 POSTED ON: 4/7/2013 Public Domain
Lecture 2: Review of probability and distributions (appendix A-C) Eco420Z Dr. S. Chen Outline and keywords ● Relationship between 2 variables ● Random sampling ● Parameter, Estimator and estimate ● What takes to be a good estimator? Small sample Large sample ● Normal distribution and standard normal distribution ● Confidence interval and hypothesis testing ● Accuracy of an estimator: standard error and standard deviation Relationship between 2 variables ● Example: Linear relationship between monthly housing expenditure and monthly income: Housing=164+.27 income ● Predict the changes in housing expenditure using changes income Marginal effect of income: ● For each additional dollar of income, 27 cents are spent on housing. Or, ● Marginal propensity to consume = .27 ● For rich people, this may mean nothing... ● Scale down by income: Suppose income increases from 100 to 200: ● Percentage point change ● Income elasticity of housing expense Shortcoming of linear functions ● When income=0, housing expenditure = ● For low levels of income, linear functions often fail to capture the housing expenditure correctly. Nonlinear functions ● Example1 (Quadratic wage equation) wage=5.3+.10 educ+.5 exper - .01 exper2 What is the Marginal Effect of one additional year of work experience on wages? What's the Percentage Point Change in wages for one additional increase in work experience? What is the Elasticity of wage? ● Example2 (Quadratic log wage equation) log(wage) =2.7+.10 educ+.2 exper -.01 exper2 Marginal Effect of one additional year of work experience= Elasticity= ● Other two examples Labor supply function hours = 33+45 log(wage) ✔ Demand function for beer log(bottles) = 4.7+1.25 log(price) Random sampling ● Example: survey for UAlbany students about their drinking behaviour Possible locations of interviews Ideal method of survey ● Definition of random sampling: If {Y1,Y2,...Yn} are independent random variables that come fro=m a common distribution, then {Y1,Y2,...Yn} is called random sample from this distribution. Or called independent identically distributed (or i.i.d) random variables from that distribution. Parameter, estimator and estimate ● Example (population mean and sample average): Suppose we have a random sample about the previous survey, {y1, y2, y3,y4,y5}={1,0,0,1,1}. Sample mean is an estimator (i.e. formula) to approximate the population mean When we use the actual survey numbers, we get the value of the sample average, also called the estimate of the population mean What takes to be a good estimator? 1. Unbiasedness An estimator is unbiased if its expectation equals the true parameter. Examples of unbiased estimators ● sample average ● y1 ● sample variance Measure the bias of an estimator bias=E[estimator] – true parameter Example of a biased estimator ● Natural sample variance 2. Efficiency Comparing two unbiased estimators W1 and W2 for parameter m, we say W1 is more efficient than W2 when Var(W1)<Var(W2) for any value of m. Example (sample average is more efficient than the y1 estimator) ● Mean square error MSE= E[(W-m)2]=Var(W)+[bias(W)]2 Take account of both unbiasedness and efficiency ● Examples: calculate the mean and variance of the following estimator: Sample average Y1 Sampling distribution ● Estimators are random variable too (because functions of random variables are random variables). Example: sample average. ● Thus any estimator has a distribution called the sampling distribution. ● Figure C.2 Large sample properties of estimators ● Figure C3. When sample size increases, the sampling distribution would be more and more concentrated around the true parameter.) ● Example: The notorious unbiased estimator Y1 for population mean is the widest sample distribution. The sample average is much more narrowly distributed around population mean. In fact the variance decreases whenever N increases. ● Consistency An estimator is called consistent estimator if the probability of nonzero bias decreases with sample size and if this probability eventually converges to zero in large sample. ● Sample average of a random sample must be consistent (Law of Large Number) ● Example: Sample variance is also a consistent estimator of population variance (and also unbiased) Natural sample variance is also consistent (but biased). Useful facts about consistent estimators ● Functions of consistent estimators are also consistent. ● Examples sample variance is consistent so its square root (i.e. sample standard deviation) is a consistent estimator of population standard deviation. The difference between two the sample averages is a consistent estimator for their difference in population means. ● Consistency basically tells us that the distribution of estimators are to collapse around the true parameter as sample size gets large. ● But this provides no information about the shape of the distribution. ● Asymptotic normality If the distribution of an estimator looks more and more like a normal distribution as the sample size get large, then this estimator is said to be asymptotic normal. Review of normal distributions ● Suppose a random variable X is normally distributed (i.e. has a bell shape). We often write X~Normal(m,s2) to indicate that it has mean m and variance s2. ● Standard normal distribution is a normal distribution with zero mean and unit variance. I.e. X~Normal(0,1). ● Standard normal table p. 847 (can you read it?) P{Z<a given number at the table margin)} =number inside the table ● Let Z be standard normal. Use the table to answer the following questions: P{Z<1.96}= P{-1.96<Z}= P{-1.96<Z<1.96}= ● Normalization: i.e. demeaned by mean and rescaled by standard deviation Any normal random variable Y~Normal(m,s2) can be normalized to be standard normal. Example: ● Normalization of sample average Central Limit Theory (CLT) ● A normalized sample average from any random sample must be standard normal in large sample. ● Formally, let {y1,…yn} be a random sample with mean m and variance s2. Then ● Furthermore, even if we replace the population variance in the normalization with the sample variance, the CLT still holds. Applications of CLT ● Remember that sample average is a random variable. After normalization, CLT tells us that the normalized sample average must be standard normal. ● This will be very useful when we construct confidence interval for the sample average. Recall that P{-1.96<Z<1.96}=.95 if Z is standard normal. Can you construct a 95% confidence interval of sample average for the estimation of population mean? What if the sample is not large enough? ● Then the CLT cannot apply. So the normalized sample average (using sample deviation) cannot be standard normal. ● Student-t Y m ~ t n 1 s n ● Student-t table (p. 849; can you read it?) ● Suppose you have a small sample (n=20). Can you construct the 95% confidence interval of sample average for the estimation of population mean? the critical points (for 2 sides): the critical points (for 1 side) ● If the sample is large, then use the Standard Normal table instead. The critical points (for 2 tails) the critical points (for 1 side) Hypothesis testing ● Example: want to test whether it’s true that more than ½ of UAlbany students drink weekly. Null hypothesis: H0: = 0.5 Alternative hypothesis: H1: > 0.5 (one-sided) ● Procedure (use confidence intervals): Survey and get a random sample {1,0,0,1,1} Estimate the sample average y = 0.6 Construct the 95% one-sided confidence interval using the true value (0.5): Example: Race discrimination in hiring (p.787) ● Consider 5 pairs of people interview for several jobs. In each pair, one person was black and the other is white. Their resumes show they are virtually the same in terms of education and experience. We observe their outcomes for the 241 interviews. Let b and b indicate the probability of having a job offer for black and for white, resp. Construct hypotheses: H0: b- w =0; H1: b- w 0 Calculate sample averages of the difference B W = 224 .357 = .133 Calculate sample standard deviation of the diff: s=.482. Construct the 95% confidence interval Construct the 99% confidence interval Accuracy of the sample average Suppose we have random sample y ~ (m,s2) and its sample average is 2 s y ~ (m, ) n ● Standard deviation of sample average sd(y) = s n ● To estimate s 2, we use the sample standard variance. n ( yi y ) 2 s 2 = i =1 n 1 ● We call s the standard deviation of y. ● The unbiased estimate of sd(sample avg)is called standard error of the sample average sˆ se(y) = n Example (Problem set C.8) ● Larry Bird has FGA=1206 and FGM=455. The outcome of each shot (denoted by Yi) is a zero-one Bernoulli variable. ● Yi = 1 with probability 0 with probability 1- 1. To estimate , we use the sample average FGM/FGA=455/1206. 2. Find standard deviation of the sample average: Given that Y is Bernoulli with mean equal to , the variance of Y is Var (Y ) = (1 ) Thus the variance of the sample average is Var (Y ) = (1 ) n where n is FGA of a given player. The standard deviation of the sample average is (1 ) sd (Y ) = n Note that the sample counter part of the standard deviation is standard error Y (1 Y ) se(Y ) = n 3. By Central Limit Theorem, the normalized sample average is standard normal in large sample. Y ~ N (0,1) se(Y ) Hypothesis testing for Larry Bird for the 1% significance level: H0: =.5 H1: >.5