Document Sample

A review of key statistical concepts An overview of the review • Populations and parameters • Samples and statistics • Confidence intervals • Hypothesis testing Populations and Parameters … and Samples and Statistics Populations and Parameters • A population is any large collection of objects or individuals, such as Americans, students, or trees about which information is desired. • A parameter is any summary number, like an average or percentage, that describes the entire population. Parameters • Examples: – population mean µ = average temperature – population proportion p = proportion approving of president’s job performance • 99.999999999999….% of the time, we don’t (...or can’t) know the real value of a population parameter. • Best we can do is estimate the parameter! Samples and Statistics • A sample is a representative group drawn from the population. • A statistic is any summary number, like an average or percentage, that describes the sample. Statistics • Examples – sample mean (“x-bar”) – sample proportion (“p-hat”) • Because samples are manageable in size, we can determine the value of statistics. • We use the known statistic to learn about the unknown parameter. Example: Smoking at PSU? Population of What proportion 42,000 PSU students smoke regularly? Sample of 43% reported 987 PSU students smoking regularly Example: Grade inflation? Population of 5 million college Is the average students GPA 2.7? How likely is it that 100 students would have an average Sample of GPA as large as 2.9 100 college students if the population average was 2.7? Example: A linear relationship? Regression Plot Weight = -2037.00 + 130.817 Gestation S = 167.327 R-Sq = 77.5 % R-Sq(adj) = 76.8 % 3500 Birth weight (grams) E(Y) = A + B X 3000 Y-hat = a + b X 2500 34 35 36 37 38 39 40 41 42 Gestation (weeks) Two ways to learn about a population parameter • Confidence intervals estimate parameters. – We can be 95% confident that the proportion of Penn State students who have a tattoo is between 5.1% and 15.3%. • Hypothesis tests test the value of parameters. – There is enough statistical evidence to conclude that the mean normal body temperature of adults is lower than 98.6 degrees F. Confidence intervals A review of concepts The situation • Want to estimate the actual population mean . • But can only get “x-bar,” the sample mean. • Use “x-bar” to find a range of values, L<<U, that we can be really confident contains . • The range of values is called a “confidence interval.” Confidence intervals for proportions in newspapers • “Sample estimate”: 69% of 1,027 U.S. adults think using a hand-held cell phone while driving a car should be illegal. • The “margin of error” is 3%. • The “confidence interval” is 69% ± 3%. • We can be really confident that between 66% and 72% of all U.S. adults think using a hand-held cell phone while driving a car should be illegal. Source: ABC News Poll, May 16-20, 2001 General form of most confidence intervals • Sample estimate ± margin of error • Lower limit L = estimate - margin of error • Upper limit U = estimate + margin of error • Then, we’re confident that the value of the population parameter is somewhere between L and U. (1-α)100% t-interval for population mean Formula in words: Sample mean ± (t-multiplier × standard error) Formula in notation: xt 2 s 1 , n 1 n Determining the t-multiplier 0.4 0.3 density 0.2 1 0.1 2 2 0.0 -4 -3 -2 -1 0 1 2 3 4 t(14) Typical t-multipliers Conf. coefficient Conf. level 1 1 1 100 % 2 0.90 90% 0.95 0.95 95% 0.975 0.99 99% 0.995 t-interval for mean in Minitab One-Sample T: FVC Variable N Mean StDev SE Mean 95.0% CI FVC 8 3.5875 0.1458 0.0515 (3.4655,3.7095) We can be 95% confident that the mean forced vital capacity of all female college students is between 3.5 and 3.7 liters. Length of confidence interval • Want confidence interval to be as narrow as possible. • Length = Upper Limit - Lower Limit How length of CI is affected? xt s n • As sample mean increases… • As the standard deviation decreases… • As we decrease the confidence level… • As we increase sample size … Hypothesis testing A review of concepts General idea of hypothesis testing • Make an initial assumption. • Collect evidence (data). • Based on the available evidence (data), decide whether to reject or not reject the initial assumption. Example: Normal body temperature Population of Is average adult many, many adults body temperature 98.6 degrees? Or is it lower? Average body Sample of temperature of 130 130 adults sampled adults is 98.25 degrees. Making the decision • It is either likely or unlikely that we would collect the evidence we did given the initial assumption. • If it is likely, then we “do not reject” our initial assumption. There is not enough evidence to do otherwise. Making the decision (cont’d) • If it is unlikely, then: – either our initial assumption is correct and we experienced a very unusual event – or our initial assumption is incorrect • In statistics, if it is unlikely, we “reject” our initial assumption. Again, idea of hypothesis testing: criminal trial analogy • First, state 2 hypotheses, the null hypothesis (“H0”) and the alternative hypothesis (“HA”) – H0: Defendant is not guilty (innocent). – HA: Defendant is guilty. Criminal trial analogy (continued) • Then, collect evidence, such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, handwriting samples, etc. • In statistics, the data are the evidence. Criminal trial analogy (continued) • Then, make initial assumption. – Our criminal justice system is based on “defendant is innocent until proven guilty.” – So, assume defendant is innocent. • In statistics, we always assume the null hypothesis is true. Criminal trial analogy (continued) • Then, make a decision based on the available evidence. – If there is sufficient evidence (“beyond a reasonable doubt”), reject the null hypothesis. (Behave as if defendant is guilty.) – If there is insufficient evidence, do not reject the null hypothesis. (Behave as if defendant is innocent.) Very important point • If we reject the null hypothesis, we do not prove the alternative hypothesis is true. • If we do not reject the null hypothesis, we do not prove the null hypothesis is true. • We merely state there is enough evidence to behave one way or the other. • Always true in statistics! Whatever the decision, there is always a chance we made an error. Errors in criminal trials Truth Jury Not guilty Guilty Decision Not guilty OK ERROR Guilty ERROR OK Errors in hypothesis testing Truth Null Alternative Decision hypothesis hypothesis Do not TYPE II OK reject null ERROR TYPE I Reject null OK ERROR Definitions: Types of errors • Type I error: The null hypothesis is rejected when it is true. • Type II error: The null hypothesis is not rejected when it is false. • There is always a chance of making one of these errors. But, a good scientific study will minimize the chance of doing so! Making the decision • “It is either likely or unlikely that we would collect the evidence we did given the initial assumption.” • Two ways to determine likely or unlikely: – Critical value approach (many textbooks) – P-value approach (science, journals, software) Possible hypotheses about mean µ Type Null Alternative Right-tailed H0 : 3 H0 : 3 Left-tailed H0 : 3 H0 : 3 Two-tailed H0 : 3 H0 : 3 Critical value approach • Using sample data and assuming null hypothesis is true, calculate the value of the test statistic. • Set the significance level, α, the probability of making a Type I error to be small (0.05 or 0.01). • Compare the value of the test statistic to the known distribution of the test statistic. • If the test statistic is more extreme than expected, allowing for an α chance of error, reject the null hypothesis. Otherwise, don’t reject the null. Right-tailed critical value 0.4 0.3 density 0.2 0.95 0.1 0.05 0.0 -4 -3 -2 -1 0 1 2 3 4 t(14) 1.7613 Reject null hypothesis if test statistic is greater than 1.7613. Left-tailed critical value 0.4 0.3 density 0.2 0.95 0.1 0.05 0.0 -4 -3 -2 -1 0 1 2 3 4 t(14) -1.7613 Reject null hypothesis if test statistic is less than -1.7613. Two-tailed critical value 0.4 0.3 0.95 density 0.2 0.1 0.025 0.025 0.0 -4 -3 -2 -1 0 1 2 3 4 -2.1448 t(14) 2.1448 Reject null hypothesis if test statistic is less than -2.1448 or greater than 2.1448. P-value approach • Using sample data and assuming null hypothesis is true, calculate the value of the test statistic. • Using known distribution of the test statistic, calculate the P-value = “If the null hypothesis is true, what is the probability that we’d observe a more extreme test statistic than we did?” • Set the significance level, α, the probability of making a Type I error to be small (0.05 or 0.01). • If the probability is small, i.e., smaller than α, reject the null hypothesis. Otherwise, don’t reject the null. Right-tailed P-value 0.4 0.3 density 0.2 0.9873 0.1 0.0127 0.0 -4 -3 -2 -1 0 1 2 3 4 t(14) t* = 2.5 If it’s unlikely to observe such a large test statistic, i.e., if the P- value (0.0127) is smaller than α, reject the null hypothesis. Left-tailed P-value 0.4 0.3 density 0.2 0.9873 0.1 0.0127 0.0 -4 -3 -2 -1 0 1 2 3 4 t* = -2.5 t(14) If it’s unlikely to observe such a small test statistic, i.e., if the P- value (0.0127) is smaller than α, reject the null hypothesis. Two-tailed P-value 0.4 0.3 density 0.2 0.9746 0.1 0.0127 0.0127 0.0 -4 -3 -2 -1 0 1 2 3 4 t* = -2.5 t(14) t* = 2.5 If it’s unlikely to observe such an extreme test statistic, i.e., if the P-value (0.0254) is smaller than α, reject the null hypothesis. Example: Right-tailed test Brinell hardness measurement of ductile iron subcritically annealed: 170 167 174 179 179 H 0 : 170 156 163 156 187 156 183 179 174 179 170 H A : 170 156 187 179 183 174 187 167 159 170 179 One-Sample T: Brinell Test of mu = 170 vs mu > 170 Variable N Mean StDev SE Mean T P Brinell 25 172.52 10.31 2.06 1.22 0.117 Example: Right-tailed critical value 0.4 0.3 density 0.2 0.95 0.1 0.05 0.0 -4 -3 -2 -1 0 1 2 3 4 t(24) 1.7109 Example: Right-tailed P-value 0.4 0.3 density 0.883 0.2 0.117 0.1 0.0 -4 -3 -2 -1 0 1 2 3 4 t(24) t* = 1.22 Example: Left-tailed test Height of sunflower seedlings. 11.5 11.8 15.7 16.1 14.1 10.5 H 0 : 15.7 15.2 19.0 12.8 12.4 19.2 13.5 16.5 13.5 14.4 16.7 10.9 13.0 15.1 17.1 13.3 12.4 8.5 14.3 12.9 11.1 15.0 13.3 15.8 13.5 H A : 15.7 9.3 12.2 10.3 Test of mu = 15.7 vs mu < 15.7 Variable N Mean StDev SE Mean T P Sunflower 33 13.664 2.544 0.443 -4.60 0.000 Example: Left-tailed critical value 0.4 0.3 density 0.2 0.95 0.1 0.05 0.0 -4 -3 -2 -1 0 1 2 3 4 -1.6939 t(32) Example: Left-tailed P-value 0.4 0.3 density 0.2 >0.9999 0.1 <0.0001 0.0 -5 0 5 -4.60 t(32) Example: Two-tailed test Thickness of spearmint gum. 7.65 7.60 7.65 7.70 7.55 H 0 : 7.5 7.55 7.40 7.40 7.50 7.50 H A : 7.5 Test of mu = 7.5 vs mu not = 7.5 Variable N Mean StDev SE Mean T P Gum 10 7.5500 0.1027 0.0325 1.54 0.158 Example: Two-tailed critical value 0.4 0.3 0.95 density 0.2 0.1 0.025 0.025 0.0 -4 -3 -2 -1 0 1 2 3 4 t(9) -2.2622 2.2622 Example: Two-tailed P-value 0.4 0.3 density 0.2 0.842 0.1 0.079 0.079 0.0 -4 -3 -2 -1 0 1 2 3 4 -1.54 t(9) 1.54

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 4 |

posted: | 11/25/2011 |

language: | English |

pages: | 53 |

OTHER DOCS BY T1u2o2L

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.