"Hypothesis Testing - PowerPoint"
Hypothesis Testing What is Hypothesis Testing? • Sample information can be used to obtain point estimates or confidence intervals about population parameters • Alternatively, sample information can be used to test the validity of conjectures about these parameters – Are private banks more profitable than state-owned banks in the EU countries? – Are returns on a stock different before and after a stock split? – Is there a larger variability in real estate prices in Champaign than in Urbana? What is Hypothesis Testing? • A hypothesis is a statement about a population parameter from one or more populations • Statistically testable hypotheses are formulated based on theories that are used to make predictions • A hypothesis test is a procedure that – States the hypothesis to be tested – Uses sample information and formulates a decision rule – Based on the outcome of the decision rule the hypothesis is statistically validated or rejected Steps in Hypothesis Testing • The following steps are followed in a hypothesis test – State the hypothesis – Identify the appropriate test statistics and its probability distribution – Specify the significance level – State the decision rule – Collect the data and calculate the test statistic – Make the statistical decision – Evaluate whether the statistical decision implies a corresponding financial decision Stating the Testable Hypotheses • A hypothesis test always includes two hypotheses – Null Hypothesis (H0): The null hypothesis is the hypothesis to be tested • E.g., The average debt-equity ratio for US industrial firms is 20% – Alternative Hypothesis (H1): The alternative hypothesis is the one accepted if the null hypothesis is rejected • E.g., The average debt-equity ratio for US industrial firms is different than 20% Stating the Testable Hypotheses • Note: The null hypothesis is a statement that is considered true unless the sample used in the hypothesis testing provides evidence that it is false • Hypothesis tests for a population parameter in relation to a possible value 0 can be formulated as follows – H0: = 0 vs. H1: 0 – H0: 0 vs. H1: > 0 – H0: 0 vs. H1: < 0 Stating the Testable Hypotheses • The first formulation is a two-sided test while the other two are one-sided tests • In each formulation the null and the alternative account for all possible values of the population parameter • Regardless of the formulation, the test is always conducted at the point of equality, = 0 Stating the Testable Hypotheses • How do we state the null and alternative hypotheses? • Example: Suppose that theory tells us that growth funds outperform value funds – H0: Growth funds perform worse or equal to value funds – H1: Growth funds perform better than value funds • We formulate the alternative hypothesis as the statement that the condition is true and test the validity of the null that the statement is false Identifying the Test Statistic and its Probability Distribution • The decision rule for the hypothesis test is based on a test statistic • The test statistic is a quantity calculated from sample information that typically has the following form (Sample Statistic – Value of Parameter under H0)/St. Error of Sample Statistic Identifying the Test Statistic and its Probability Distribution • Example: Suppose that we want to test the null hypothesis that the mean return on the S&P 500 index during the past five years is less or equal than 10% vs. the alternative that it is greater • Drawing a sample and calculating the sample mean, we know – If population distribution is normal with known variance, sample mean follows normal distribution and we use the standardized variable Z as our test statistic Identifying the Test Statistic and its Probability Distribution – If in the above case, the population variance is unknown, but the sample is large, we again use Z as our test statistic – If the population variance is unknown or sample size is small, we use the variable t as out test statistic • If, for example, the variance of S&P 500 returns is unknown, we will use the variable t, known as the t-statistic X tn 1 s/ n Specifying the Significance Level • To reject or not the null hypothesis, the t-statistic is compared to a pre-specified value • The selected value is based on a pre-determined level of significance • Note that the null hypothesis can be either true or false • However, there are four possible outcomes when a hypothesis is tested Specifying the Significance Level • A false null hypothesis is rejected, which is a correct decision • A true null hypothesis is rejected (this is called a Type I error) • A false null hypothesis is not rejected (this is called a Type II error) • A true null hypothesis is not rejected, which is again a correct decision Specifying the Significance Level • The probability of Type I error in a hypothesis test is called the level of significance of the test • Conducting a hypothesis test, we want the chance of type I error to be as low as possible • E.g., A level of significance of 5% implies a 5% chance of type I error • Note: As we decrease the chance of a type I error, we increase the chance of a type II error Specifying the Significance Level • Lowering the chance of type I error implies that the null will be rejected less often, including when it is false (type II error) • To lower the probabilities of both errors we need to increase the sample size • The power of a test is the probability of correctly rejecting a false null hypothesis (The power of a test is 1 – P(type II error)) • Conventional significant levels when testing hypotheses are: 10%, 5%, 1% Specifying the Significance Level • Example – If we reject the null hypothesis at the 10% significance level, we have some evidence that the alternative is true – If we reject the null hypothesis at the 5% significance level, we have strong evidence that the alternative is true – If we reject the null hypothesis at the 1% significance level, we have very strong evidence that the alternative is true Stating the Decision Rule • The decision rule compares the calculated test statistic with specific cutoffs from the tables of the statistic’s distribution • Example: Suppose that the test statistic that we use is the Z- statistic (Z variable) and that we use a 5% significance level • If the hypothesis test is H0: = 0 vs. H1: 0 then the two rejection values are Z0.025 =1.96 and - Z0.025 = -1.96 • We would reject the null if Z < -1.96 or Z > 1.96 Collecting Data, Calculating Test Statistic and Making a Decision • In collecting a sample, it is important to avoid problems of sample selection bias, such as survivorship bias • Example: If we want to test a hypothesis regarding bank performance and we choose in our sample only the banks that exist in the last quarter, we do not include the banks that have failed • Banks still in existence must have performed better and, thus, there will be some bias in our sample Hypothesis Tests and Financial Decisions • Deciding to reject or not the null hypothesis implies making a statistical decision • Does this always translate into a corresponding financial decision? • Example: Suppose we find support through a test for the hypothesis that on average stocks provide higher returns than bonds Hypothesis Tests and Financial Decisions • Does this statistical decision have a financial meaning, as well? • From a financial or investment perspective we may also want to understand what are the risks of investing in these two types of assets • Finally, we define the p-value as the smallest level of significance at which we can reject the null hypothesis Hypothesis Test for a Single Mean (Normal Distribution, Variance Unknown) Hypothesis Test Reject H0 if (Significance level ) H0: = 0 or 0 X 0 tn 1, s/ n H1: > 0 H0: = 0 or 0 X 0 tn 1, H1: < 0 s/ n H0: = 0 Either of the above two decision rules holds H1: 0 Example of Hypothesis Test for a Single Mean • Suppose that the controller of a firm monitors the firm’s payments from its customers through days receivables • The firm has tried to maintain an average of 45 days in receivables • A recent random sample of 50 accounts has shown a mean of 49 days and a standard deviation of 8 days • Can we reject the hypothesis that the average days in receivables for this firm has increased? Example of Hypothesis Test for a Single Mean • The testable hypotheses are stated as follows H0: 45 H0: > 45 • The test can be conducted at the 5% and 1% levels of significance • Since the population variance is unknown, we use the t-statistic, which is 49 45 t49 3.536 8 / 50 Example of Hypothesis Test for a Single Mean • The cutoffs for the t-distribution with 49 degrees of freedom at the 5% and 1% level of significance are 1.677 and 2.405, respectively • Given that our t-statistic is greater than both cutoffs, the null hypothesis is rejected both at the 5% and 1% levels • This implies that there has been a statistically significant increase in the days receivables for this firm Hypothesis Test for Difference Between Population Means • We often want to test the hypothesis that the population means differ between two groups • Examples – Is the average debt-equity ratio higher for mature compared to young firms? – Do average stock returns differ by decade? – Do community banks on average lend more to small businesses than larger banking institutions? – Do average corporate defaults differ by industry? Hypothesis Test for Difference Between Population Means • Taking samples from the two populations, we can formulate the following hypotheses – H0: 1 = 2 vs. H1: 1 2 – H0: 1 2 vs. H1: 1 > 2 – H0: 1 2 vs. H1: 1 < 2 • Two cases (assuming samples are independent): – Populations are assumed normally distributed, variances are unknown, but equal – Populations are assumed normally distributed, variances are unknown, but unequal Hypothesis Test for Difference Between Population Means • When population variances are assumed to be equal, the t- statistic is as follows t X1 X 2 1 2 s2 s2 p p n1 n2 where 2 n1 1s1 n2 1s2 2 2 sp n1 n2 2 and the degrees of freedom are n1 + n2 -2 Hypothesis Test for Difference Between Population Means • When population variances cannot be assumed to be equal, the t-statistic is as follows t X1 X 2 1 2 2 2 s1 s2 n1 n2 and the degrees of freedom are 2 s2 s2 1 2 n1 n2 df s1 1 2 / n 2 s2 / n 2 2 2 n1 n2 Example of Hypothesis Test for Differences Between Population Means • Suppose that we observe monthly returns on the S&P 500 from the 1970s and the 1980s (equal samples = 120 observations) – For the 1970s, the mean monthly return is 0.58 and the standard deviation is 4.598 – For the 1980s, the mean monthly return is 1.47 and the standard deviation is 4.738 • We want to test whether the two population means are equal, assuming that they are both normally distributed and that variances are not known Example of Hypothesis Test for Differences Between Population Means • The hypothesis test is formulated as follows H0: 70 = 80 vs. H1: 70 80 • Suppose we are interested in testing the above hypothesis at the 5% and 1% levels of significance • Assuming the two samples are independent, the degrees of freedom are 238 Example of Hypothesis Test for Differences Between Population Means • Plugging the relevant information into the formulas for the estimator of the common population variance, s2, and the t- statistic, we find that t = -1.477 • The cutoff of the t-distribution for this two-sided test are – At the 5% level, we reject the null if t < -1.972 or t > 1.972 – At the 1% level, we reject the null if t < -2.601 or t > 2.601 • Given our t-statistic of –1.477, we cannot reject the null hypothesis at either the 5% or the 1% significance level