Document Sample

Comparing Datasets and Comparing a Dataset with a Standard How different is enough? Concepts: • Independence of each data point • Test statistics • Central Limit Theorem • Standard error of the mean • Confidence interval for a mean • Significance levels • How to apply in Excel module 7 2 Independent measurements: • Each measurement must be independent (shake up the basket of tickets) • Example of non-independent measurements: – Public responses to questions (one result affects the next person’s answer) – Samplers placed too close together so air flows are affected module 7 3 Test statistics: • Some number that is calculated based on the data • In the student’s t test, for example, t • If t is >= 1.96, and you have a normally distributed population, you know you are to the right on the curve where 95% of the data is in the inner portion is symmetrically between the right and left (t=1.96 on the right and - module 7 4 1.96 on the left) Test statistics correspond to significance levels • “P” stands for percentile • Pth percentile is where p of the data falls below, and 1-p fall above: module 7 5 Two major types of questions: • Comparing the mean against a standard – Does the air quality here meet the NAAQS? • Comparing two datasets – Is the air quality different in 2006 than 2005? – Or, is the air quality better? – Or, is the air quality worse? module 7 6 Comparing mean to a standard: • Did the air quality meet the CARB annual stnd of 12 microg/m3? Ft Ft Smith Ft Smith N_Fort year Smith avg Max Smith Min „05 14.78 0.1 37.9 77 module 7 7 Central Limit Theorem (magic!) • Even if the underlying population is not normally distributed • If we repeatedly take datasets • These different datasets will have means that cluster around the true mean • And the distribution of these means is normally distributed! module 7 8 magic concept #2: Standard error of the mean • Represents uncertainty around the mean • as sample size N gets bigger, your error gets smaller! • The bigger the N, the more tightly you can estimate mean • LIKE standard deviation N for a population, but this is for YOUR sample module 7 9 For a “large” sample (N > 60), or when very close to a normal distribution: A confidence interval for a population mean is: s x Z n Choice of z determines 90%, 95%, etc. module 7 10 For a “small” sample: Replace the Z value with a t value to get: s x t n where “t” comes from Student’s t distribution, and depends on the sample size. module 7 11 Student’s t distribution versus Normal Z distribution T-distribution and Standard Normal Z distribution 0.4 Z distribution 0.3 density 0.2 T with 5 d.f. 0.1 0.0 -5 0 5 Value module 7 12 compare t and Z values: Confidence t value with Z value level 5 d.f 90% 2.015 1.65 95% 2.571 1.96 99% 4.032 2.58 module 7 13 What happens as sample gets larger? T-distribution and Standard Normal Z distribution 0.4 Z distribution 0.3 density 0.2 T with 60 d.f. 0.1 0.0 -5 0 5 Value module 7 14 What happens to CI as sample gets larger? s x Z For large samples: n Z and t values become almost s identical, so CIs are x t almost identical. n module 7 15 First, graph and review data: • Use box plot add-in • Evaluate spread • Evaluate how far apart mean and median are • (assume the sampling design and the QC are good) module 7 16 Excel summary stats: module 7 17 40 1. Use the box-plot N=77 35 add-in 30 2. Calculate Min 0.1 25 summary 25th 7.5 stats 20 Median 13.7 15 75th 18.1 10 Max 37.9 Mean 14.8 5 SD 8.7 0 Ft Smith module 7 18 Our question: • Can we be 95%, 90% or how confident that this mean of 14.78 is really greater than the standard of 12? • Saw that N = 77, and mean and median not too different • Use z (normal) rather than t module 7 19 The mean is 14.8 +- what? • We know the equation for CI is • s x Z n • The width of the confidence interval represents how sure we want to be that this CI includes the true mean • Now all we need to decide is how confident we want to be module 7 20 CI calculation: • For 95%, z = 1.96 (often rounded to 2) • Stnd error (sigma/N) = (8.66/square root of 77) = 0.98 • CI around mean = 2 x 0.98 • We can be 95% sure that the mean is included in (mean +- 2), or 14.8-2 at the low end, to 14.8 + 2 at the high end • This does NOT include 12 ! module 7 21 Excel can also calculate a confidence interval around the mean: The mean plus and minus 1.93 is a 95% confidence interval that does NOT include 12! module 7 22 We know we are more than 95% confident, but how confident can we be that Ft Smith mean > 12? • Calculate where on the curve our mean of 14.8 is, in terms of the z (normal) score, • Or if N small, use the t score: module 7 23 To find where we are on the curve, calc the test statistic: • Ft Smith mean = 14.8, sigma =8.66, N =77 • Calculate the test (x ) statistic, which in this z case is the z factor (we decided we can use the z rather than the t distribution) • If N was < 60, the test N stat is t, but Data‟s calculated the same mean module 7 The stnd of 12 24 way Calculate z easily: • our mean 14.8 minus the standard of 12 (treat the real mean (mu) as the stnd) is the numerator (= 2.8) • The stnd error is sigma/square root of N = 0.98 (same as for CI) • so z = (2.8)/0.98 = z = 2.84 • So where is this z on the curve? • Remember at z = 3 we are to the right of ~ 99% module 7 25 Where on the curve? Z=2 Z=3 So between 95 and 99% probable that the true mean will not include 12 module 7 26 Can calculate exactly where on the curve, using Excel: • Use Normsdist function, with z If z (or t) = 2.84, in Excel: Yields 99.8% probability that the true mean does NOT include 12 module 7 27

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 1 |

posted: | 11/16/2011 |

language: | English |

pages: | 27 |

OTHER DOCS BY liaoqinmei

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.