Document Sample

Sampling Theory & Standard Errors Statistical Inference Sampling Techniques Sampling Distributions The Central Limit Theorem The Standard Error of Sample Statistics 2 KEY CONCEPTS ***** Sampling Theory & Standard Errors The concept of statistical inference Population Parameter Sample Statistic Sampling method Inference or generalization The five critical questions in the inferential process The advantages of studying a sample vis-à-vis a population Sampling techniques Probability Techniques Simple random sampling Stratified sampling Cluster sampling Multistage sampling Systematic sampling Non-probability techniques Quota sampling Accidental sampling Purposive sampling Snowball sampling Table of random numbers Factors to consider in determining sample size Variance of the trait in the population The sampling method The power of the statistics to be used (1-) The required accuracy of the generalization The standard error of the mean Confidence intervals: 95% and 99% The concept of a sampling distribution of means The standard error of a sampling distribution of means Z scores of 1.96 and 2.58 Provisions of the Central Limit Theorem Standard error of a sample mean and its relationship to the standard error of a sampling distribution of means Change in the sampling distribution of means when the sample size is small The t distribution W. C. Gosset (1876-1937) Degrees of freedom (df) in a t distribution Differences between the t distribution and the normal distributions Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 3 Key concepts ( con'd ) 95% and 99% confidence intervals using the t and normal distributions Sampling distribution of sample proportions Standard error of a sample proportion Confidence interval of a sample proportion The Central Limit Theorem and the sampling distribution of a proportion Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 4 Lecture Outline The concept of statistical inference Five critical questions about statistical inference Sampling methods How to determine the accuracy of a generalization; confidence intervals The Central Limit Theorem Error in generalizing a sample mean The t distribution v the normal distribution Error in generalizing a sample proportion Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 5 Statistical Inference Statistical analysis frequently involves making inferences from a sample to a population. The Population = parameter = the population mean sampling method inference or generalization The Sample X X = statistic = sample mean Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 6 Five Critical Questions in the Inferential Process Why study a sample? Why not the entire population? How should the sample be drawn from the population? How large a sample should be drawn? How does one know if the sample is representative of the population? How does one know if the generalization from the sample to the population is correct? Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 7 Why study a sample? Why not the entire population? Usually, the population is too large to study, i.e. hundreds of thousands or millions of cases Time, money, and patience limit studying an entire population Depending upon the time it takes to gather the data The population may change by the time the generalization is made, making the generalization invalid With automation, however, entire populations may be conveniently studied, e.g. All the criminal records in a state criminal history repository All articles on murder in a metropolitan newspaper for the last 20-years Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 8 How should the sample be drawn from the population? Sampling theory is the technology that addresses this question. Various sampling methods are listed below. Probability Techniques Simple random sampling Stratified sampling Cluster sampling Multistage sampling Systematic sampling Non-Probability Techniques Quota sampling Accidental sampling Purposive sampling Snowball sampling Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 9 How large a sample should be drawn? The goal is to acquire a sample that is representative of the population, i.e. a proportional miniature of the population. Determinates of an adequate sample size The variability of the trait in the population … The more variable, the larger the sample must be If the variability is 0.0, N can be 1 The sampling method used … Some are more efficient than others in securing a sample which is representative The power of the statistics (1-) used to analyze the data, some are more powerful than others. The required level of accuracy of the generalization … Up to a certain limit, the larger the sample, the more accurate the generalization Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 10 How does one know if the sample is representative of the population? You never know for sure unless the sample is the population. Other things being equal The larger the sample the more likely it will be representative The smaller the sample, the greater the under-representation of the population variance One way to check for representativeness Draw a 2nd sample of the same size, by the same method And compare the statistics derived from the 1st & 2nd samples Confidence that the 1st sample is representative increases if the two sets of sample statistics are comparable. Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 11 How does one know if the generalization from the sample to the population is correct? In statistical inference, the question is put the other way around: How large a margin of error should be attached to a statistic generalized from a sample to a population? Example A sample of 300 cases is drawn from the files of a criminal court … The mean time from filing to disposition in felony cases is 72 days … The best guess as to the population parameter is 72 days, some degree of error. In statistics the margin of error associated with a sample statistic is called a standard error of that statistic In this example, we are concerned with the standard error of the mean (S X) Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 12 An Example of the Standard Error of the Mean (SX) and the 95% Confidence Interval Normally, a statistical generalization of a sample mean is made within a 95% confidence interval, which is estimated from the S X Example The mean case processing time in felony cases is 72 days … Our best guess as to the population parameter is also 72 days … And we are 95% confident that the population parameter lies within the interval 69.3 days to 74.7 days Population = ? 69.3 74.7 Sample X = 72 Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 13 Sampling Techniques Probability Sampling Techniques Simple random sample Any technique that assures that everyone in the population has an equal probability of being selected into the sample Stratified sampling The population is first divided into subgroups using various strata (e.g. gender, race, etc.) and proportionate numbers of cases are randomly selected from within each strata Cluster sampling A segmented approach to sampling; e.g. randomly selecting 10% of the counties in a state, then 20% of the census tracts within the counties selected, then 5% of the blocks within the census tract, etc. Multistage sampling Sampling through a series of stages that may combine stratification, cluster, and/or simple random sampling at various stages Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 14 Sampling techniques ( con'd ) Systematic sampling Selecting every kth case from a list of cases, i.e. a sampling frame Non-probability Techniques Quota sampling The same idea as stratified sampling, but proportionate numbers of cases are selected from within each strata by some method other than simple random sampling. Accidental sampling Also called sampling by convenience. Using subjects who are convenient, i.e. students in a class, the man–on–the-street interviews used by TV reporters Purposive sampling Using a group, community, ballot box known from prior research to be representative, .e.g. focus group Snowball sampling Used when little is known about the population. Acquiring one subject, then asking them to refer you to another subject, and so forth Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 15 Random Sampling & Tables of Random Numbers Steps in drawing a simple random sample Secure a sampling frame of the population Assign each member of the population a unique identifying number, if they do not already have one Determine the sample size Pick a random starting place in a table of random numbers and match these numbers with the sampling frame until the sample is filled If the same number comes up twice, disregard it Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 16 Creating Random Numbers A sequence of random numbers is such that it has no identifiable statistical pattern. Any number is equi-probable of following any other number. Many algorithms have been developed to generate random numbers. For example … The mid-square technique Pick a 4-digit number other than 0000, say 3829 Square this number: (3829)2 = 14,661,241 Select the 3rd, 4th, 5th, & 6th digits of this number and square it: (6612)2 = 43,718,544 Repeat the process: (7185)2 = 51,624,225 This process will produce a series of four digit numbers with no apparent pattern until it returns to the number 3829 Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 17 How to Determine if a Random Sample Is Representative Select another sample of the same size, by the same method, and compare means derived from the two samples. The comparability of these means is your index of confidence that the 1st sample is representative If you want to be very sure, take many samples Compare the variability among the resulting sample means The mean of all these means (X) is the best estimate of the population parameter () The resulting histogram of all these sample means is called a sampling distribution of means The interesting discovery is that the sampling distribution of sample means approximates a normal distribution Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 18 Sampling Distribution of Sample Means Population = ? X1 X2 X3 Xk Frequency of Xk X4 X X X distribution of means, i.e. the standard error of the sampling distribution X = standard deviation of the sampling Interpretation We are 68.26% confident that the population mean is somewhere between X 1 X Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 19 95% and 99% Confidence Intervals 95% Confidence Interval The table for the normal distribution indicates 95% of the area under the curve lies between a Z score of 1.96 Therefore, we are 95% confident that the population mean lies between X 1.96 X 99% Confidence Interval The table for the normal distribution indicates 99% of the area under the curve lies between a Z score of 2.58 Therefore, we are 99% confident that the population mean lies between X 2.58 X Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 20 Graphic Interpretation of Confidence Intervals -2.58 -1.96 X +1.96 +2.58 95% confidence interval 99% confidence interval Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 21 The Central Limit Theorem As the sample size (n) becomes larger … The sampling distribution of means becomes approximately normal, regardless of the shape of the variable in the population The sampling distribution will be centered around the population mean , such that: X The standard deviation of the sampling distribution ( X), which is called its standard error, will approach the standard deviation of the population () divided by (n1/2) ( X) = () / n Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 22 Standard Error of a Sample Mean Rarely would one construct a sampling distribution of means and derive the standard error of this distribution in order to Determine the error in generalizing to the population Instead, the standard error of a sampling distribution of means ( X) can be estimated from the standard error of the mean (SX) of a single sample S X = (S) / Example S X = (3) / N-1 Case processing time 80-1 = 0.3375 X = 72 days, S = 3 days, N = 80 We are 68.26% confident that the population mean lies between 72 0.3375 (71.66 to 72.34) Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 23 Sampling Distribution of the Mean When the Sample is Small (N30) The smaller the sample, the more the sample variance (S2) will underestimate the population variance (2) The sampling distribution of means will have more sample-to-sample variation and The resulting distribution will deviate from normal, i.e. it will be higher in the tails of the distribution than a normal distribution If the Z scores of 1.96 and 2.58 are used to determine the 95% and 99% confidence intervals, the resulting intervals will be … Too small to account for the increased area in the tails of the sampling distribution To compensate for this problem, a t distribution is used to estimate 95% and 99% confidence intervals instead of a normal distribution. Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 24 Family of t distributions The t distribution The mathematics of the t distribution were developed by W. C. Gosset (1876-1937) and were published in 1908 f(t) = [1 / 2 + (k / 2)] k (1 / 2) (k / 2) t Distributions for Various Degrees of Freedom (df = N-1) As N approaches , the t distribution approaches A normal distribution 2 (1 + t2 / k ) –(k+1) / Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 25 Comparison of the Normal and t Distributions Critical Values of t and Z for two-tailed = 0.05 N 3 5 9 31 61 121 df 2 4 8 30 60 120 t value 4.303 2.776 2.306 2.042 2.000 1.980 1.960 Z value 1.96 1.96 1.96 1.96 1.96 1.96 1.96 Critical Values of t and Z for two-tailed = 0.01 N 3 5 9 31 61 121 df 2 4 8 30 60 120 t value 9.925 4.604 3.355 2.750 2.660 2.617 2.576 Z value 2.58 2.58 2.58 2.58 2.58 2.58 2.58 Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 26 Confidence Intervals Using the t Distribution Example Case processing time 12-1 = 0.905 X = 72 days, S = 3 days, N = 12 S X = (3) / The 95% confidence interval Find the t value for df = (12 – 1), for a two tailed test at = 0.05, t = 2.201 95% interval = X t (SX) = 72 2.201 (0.905) 95% interval (70.01 days to 73.99 days) The 99% confidence interval Find the t value for df = (12 – 1), for a two tailed test at = 0.01, t = 3.106 99% interval = X t (SX) = 72 3.106 (0.905) 99% interval (69.19 days to 74.81days) Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 27 Confidence Intervals Using the Normal Distribution Example Case processing time 12-1 = 0.905 X = 72 days, S = 3 days, N = 12 S X = (3) / The 95% confidence interval (Z = 1.96) 95% interval = X 1.96 (SX) 72 1.96 (0.905) 95% interval (70.23 days to 73.77 days) The 99% confidence interval (Z = 2.58) 99% interval = X 2.58 (SX) 72 2.58 (0.905) 99% interval (69.67 days to 74.34 days) Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 28 Comparison of Results Confidence Interval Distribution 95% t 70.01 73.99 99% 69.19 74.81 Normal 70.23 73.77 69.67 74.34 Notice The confidence intervals of the t distribution are wider than those of the normal distribution. This difference adjusts for the fact that as the sample size becomes smaller, there is more sample-to-sample variation in sample means … With the result that the tails of the sampling distribution contain more area, i.e. are heavier. Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 29 Another Example The following statistics were derived on the length of sentence and ages of a sample of 70 prisoners. Variable N Mean Standard Deviation 4.9532 4.0428 Sentence Age 70 70 5.9571 22.9429 Q What is the standard error of each mean? S X = (S) / For sentence S X = (4.9532) / For age S X = (4.0428) / 69 = 0.4869 69 = 0.5963 N-1 Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 30 Another Example (cont.) Q What is the 95% confidence interval for the mean of each variable? 95% interval = X t.05 (SX) Look up the 2-tailed t value for N-1 degrees of freedom (i.e. 69) for 0.05 level of significance. Unfortunately, the t-table in the text does not report a t-value for 69 df, so we will take the value for 60 df, i.e. t = 2.0 For sentence 95% interval = 5.9571 (2.0) ( 0.5963) 4.7645 For age 95% interval = 22.9429 (2.0) ( 0.4869) 21.969 23.917 7.1497 Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 31 SPSS and the Standard Error Of the Mean The equation for the standard error of the mean is as follows S X = (S) / N-1 The denominator N-1 is used to adjust for the fact that the smaller the sample, the more likely the standard error of the mean will be underestiamted. In SPSS, as the sample becomes larger, say N greater than 30 cases, the denominator changes from N-1 to N. For the previous examples, this difference results in the following standard errors For sentence S X = (4.9532) / For age S X = (4.0428) / 70 = 0.4832 70 = 0.592 Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 32 SPSS and the Standar Error Of the Mean (cont.) Standard errors of the mean with small and large sample equations. Variable Equation Small Sample Large Sample 0.5920 0.4832 Sentence Age 0.5963 0.4869 Notice that the large sample equation yields slightly smaller standard errors and will therefore yield smaller 95% confidences. 95% confidence interval with small and large sample estimates of the standard error of the mean. Variable Equation Small Sample 4.7645 to 7.1497 21.969 to 23.917 Large Sample 4.7761 to 7.1382 21.9789 to 23.9068 Sentence Age Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 33 Inference About a Sample Proportion In a survey of 800 people, the proportion favoring the death penalty was 0.6 (60%). What is the 95% confidence interval in generalizing this population to the population? The Population = the population proportion (?) (?) The Sample P = 0.6 P = sample proportion Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University 34 Standard Error of a Proportion The Central Limit Theorem applies to the sampling distribution of a proportion The standard error of the sampling distribution of a proportion can be estimated from a single sample, in a manner similar to that used with the mean SP = Example P(1 - P) / N Survey of attitudes towards the death penalty (800) P = proportion favorable = 0.60 Q = proportion unfavorable = 0.40 Standard error of P SP = 0.60 (1 – 0.60) / 800 = 0.017 95% confidence interval P 1.96 (SP) = 0.60 1.96 (0.017) 95% interval: (0.5667 to 0.6333) Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

DOCUMENT INFO

Shared By:

Categories:

Tags:
student time, Engineering Mathematics, statistical inference, Elementary theory, University of Exeter in Cornwall, Camborne School of Mines, standard error, Undergraduate Modules, Pearson Education, Course Description

Stats:

views: | 36 |

posted: | 1/20/2010 |

language: | English |

pages: | 34 |

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.