Docstoc

Sampling Theory & Standard Errors

Document Sample
Sampling Theory & Standard Errors Powered By Docstoc
					Sampling Theory & Standard Errors
Statistical Inference Sampling Techniques Sampling Distributions The Central Limit Theorem The Standard Error of Sample Statistics

2 KEY CONCEPTS ***** Sampling Theory & Standard Errors The concept of statistical inference Population Parameter Sample Statistic Sampling method Inference or generalization The five critical questions in the inferential process The advantages of studying a sample vis-à-vis a population Sampling techniques Probability Techniques Simple random sampling Stratified sampling Cluster sampling Multistage sampling Systematic sampling Non-probability techniques Quota sampling Accidental sampling Purposive sampling Snowball sampling Table of random numbers Factors to consider in determining sample size Variance of the trait in the population The sampling method The power of the statistics to be used (1-) The required accuracy of the generalization The standard error of the mean Confidence intervals: 95% and 99% The concept of a sampling distribution of means The standard error of a sampling distribution of means Z scores of 1.96 and 2.58 Provisions of the Central Limit Theorem Standard error of a sample mean and its relationship to the standard error of a sampling distribution of means Change in the sampling distribution of means when the sample size is small The t distribution W. C. Gosset (1876-1937) Degrees of freedom (df) in a t distribution Differences between the t distribution and the normal distributions

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

3 Key concepts ( con'd ) 95% and 99% confidence intervals using the t and normal distributions Sampling distribution of sample proportions Standard error of a sample proportion Confidence interval of a sample proportion The Central Limit Theorem and the sampling distribution of a proportion

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

4

Lecture Outline
 The concept of statistical inference  Five critical questions about statistical inference  Sampling methods  How to determine the accuracy of a generalization; confidence intervals  The Central Limit Theorem  Error in generalizing a sample mean  The t distribution v the normal distribution  Error in generalizing a sample proportion

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

5

Statistical Inference
Statistical analysis frequently involves making inferences from a sample to a population. The Population
 = parameter = the population mean



sampling method

inference or generalization

The Sample

X
X = statistic = sample mean

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

6

Five Critical Questions in the Inferential Process
 Why study a sample? Why not the entire population?  How should the sample be drawn from the population?  How large a sample should be drawn?  How does one know if the sample is representative of the population?  How does one know if the generalization from the sample to the population is correct?

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

7

Why study a sample? Why not the entire population?
Usually, the population is too large to study, i.e. hundreds of thousands or millions of cases Time, money, and patience limit studying an entire population Depending upon the time it takes to gather the data The population may change by the time the generalization is made, making the generalization invalid With automation, however, entire populations may be conveniently studied, e.g. All the criminal records in a state criminal history repository All articles on murder in a metropolitan newspaper for the last 20-years

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

8

How should the sample be drawn from the population?
Sampling theory is the technology that addresses this question. Various sampling methods are listed below.

Probability Techniques Simple random sampling Stratified sampling Cluster sampling Multistage sampling Systematic sampling

Non-Probability Techniques

Quota sampling Accidental sampling Purposive sampling Snowball sampling

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

9

How large a sample should be drawn?
The goal is to acquire a sample that is representative of the population, i.e. a proportional miniature of the population. Determinates of an adequate sample size
 The variability of the trait in the population … The more variable, the larger the sample must be If the variability is 0.0, N can be 1  The sampling method used … Some are more efficient than others in securing a sample which is representative  The power of the statistics (1-) used to analyze the data, some are more powerful than others.  The required level of accuracy of the generalization … Up to a certain limit, the larger the sample, the more accurate the generalization

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

10

How does one know if the sample is representative of the population?
You never know for sure unless the sample is the population. Other things being equal The larger the sample the more likely it will be representative The smaller the sample, the greater the under-representation of the population variance One way to check for representativeness Draw a 2nd sample of the same size, by the same method And compare the statistics derived from the 1st & 2nd samples Confidence that the 1st sample is representative increases if the two sets of sample statistics are comparable.

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

11

How does one know if the generalization from the sample to the population is correct?
In statistical inference, the question is put the other way around: How large a margin of error should be attached to a statistic generalized from a sample to a population? Example A sample of 300 cases is drawn from the files of a criminal court … The mean time from filing to disposition in felony cases is 72 days … The best guess as to the population parameter is 72 days,  some degree of error. In statistics the margin of error associated with a sample statistic is called a standard error of that statistic In this example, we are concerned with the standard error of the mean (S X)

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

12

An Example of the Standard Error of the Mean (SX) and the 95% Confidence Interval
Normally, a statistical generalization of a sample mean is made within a 95% confidence interval, which is estimated from the S X Example
The mean case processing time in felony cases is 72 days … Our best guess as to the population parameter is also 72 days … And we are 95% confident that the population parameter lies within the interval 69.3 days to 74.7 days
Population  = ?

69.3

74.7

Sample X = 72

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

13

Sampling Techniques
Probability Sampling Techniques
Simple random sample Any technique that assures that everyone in the population has an equal probability of being selected into the sample Stratified sampling The population is first divided into subgroups using various strata (e.g. gender, race, etc.) and proportionate numbers of cases are randomly selected from within each strata Cluster sampling A segmented approach to sampling; e.g. randomly selecting 10% of the counties in a state, then 20% of the census tracts within the counties selected, then 5% of the blocks within the census tract, etc. Multistage sampling Sampling through a series of stages that may combine stratification, cluster, and/or simple random sampling at various stages

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

14 Sampling techniques ( con'd )

Systematic sampling Selecting every kth case from a list of cases, i.e. a sampling frame

Non-probability Techniques
Quota sampling The same idea as stratified sampling, but proportionate numbers of cases are selected from within each strata by some method other than simple random sampling. Accidental sampling Also called sampling by convenience. Using subjects who are convenient, i.e. students in a class, the man–on–the-street interviews used by TV reporters Purposive sampling Using a group, community, ballot box known from prior research to be representative, .e.g. focus group Snowball sampling Used when little is known about the population. Acquiring one subject, then asking them to refer you to another subject, and so forth

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

15

Random Sampling & Tables of Random Numbers
Steps in drawing a simple random sample  Secure a sampling frame of the population  Assign each member of the population a unique identifying number, if they do not already have one  Determine the sample size  Pick a random starting place in a table of random numbers and match these numbers with the sampling frame until the sample is filled  If the same number comes up twice, disregard it

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

16

Creating Random Numbers
A sequence of random numbers is such that it has no identifiable statistical pattern. Any number is equi-probable of following any other number. Many algorithms have been developed to generate random numbers. For example … The mid-square technique
 Pick a 4-digit number other than 0000, say 3829  Square this number: (3829)2 = 14,661,241  Select the 3rd, 4th, 5th, & 6th digits of this number and square it: (6612)2 = 43,718,544  Repeat the process: (7185)2 = 51,624,225  This process will produce a series of four digit numbers with no apparent pattern until it returns to the number 3829

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

17

How to Determine if a Random Sample Is Representative
Select another sample of the same size, by the same method, and compare means derived from the two samples. The comparability of these means is your index of confidence that the 1st sample is representative If you want to be very sure, take many samples Compare the variability among the resulting sample means The mean of all these means (X) is the best estimate of the population parameter () The resulting histogram of all these sample means is called a sampling distribution of means The interesting discovery is that the sampling distribution of sample means approximates a normal distribution

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

18

Sampling Distribution of Sample Means
Population  = ?

X1 X2 X3   Xk
Frequency of Xk

X4

X

X

X
distribution of means, i.e. the standard error of the sampling distribution

 X = standard deviation of the sampling
Interpretation We are 68.26% confident that the population mean is somewhere between X  1  X

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

19

95% and 99% Confidence Intervals
95% Confidence Interval The table for the normal distribution indicates 95% of the area under the curve lies between a Z score of 1.96 Therefore, we are 95% confident that the population mean  lies between X  1.96  X 99% Confidence Interval The table for the normal distribution indicates 99% of the area under the curve lies between a Z score of 2.58 Therefore, we are 99% confident that the population mean  lies between X  2.58  X

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

20

Graphic Interpretation of Confidence Intervals

-2.58

-1.96

X

+1.96

+2.58

95% confidence interval

99% confidence interval

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

21

The Central Limit Theorem
As the sample size (n) becomes larger …  The sampling distribution of means becomes approximately normal, regardless of the shape of the variable in the population  The sampling distribution will be centered around the population mean , such that:

X  The standard deviation of the sampling distribution ( X), which is called its standard error, will approach the standard deviation of the population () divided by (n1/2)

( X) = () /

n

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

22

Standard Error of a Sample Mean
Rarely would one construct a sampling distribution of means and derive the standard error of this distribution in order to Determine the error in generalizing to the population Instead, the standard error of a sampling distribution of means ( X) can be estimated from the standard error of the mean (SX) of a single sample S X = (S) / Example S X = (3) / N-1

Case processing time 80-1 = 0.3375

X = 72 days, S = 3 days, N = 80

We are 68.26% confident that the population mean lies between 72 0.3375 (71.66 to 72.34)

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

23

Sampling Distribution of the Mean When the Sample is Small (N30)
The smaller the sample, the more the sample variance (S2) will underestimate the population variance (2) The sampling distribution of means will have more sample-to-sample variation and The resulting distribution will deviate from normal, i.e. it will be higher in the tails of the distribution than a normal distribution If the Z scores of 1.96 and 2.58 are used to determine the 95% and 99% confidence intervals, the resulting intervals will be … Too small to account for the increased area in the tails of the sampling distribution To compensate for this problem, a t distribution is used to estimate 95% and 99% confidence intervals instead of a normal distribution.

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

24

Family of t distributions
The t distribution The mathematics of the t distribution were developed by W. C. Gosset (1876-1937) and were published in 1908 f(t) = [1 / 2 + (k / 2)] k  (1 / 2)  (k / 2) t Distributions for Various Degrees of Freedom (df = N-1)
As N approaches , the t distribution approaches A normal distribution

2

(1 + t2 / k ) –(k+1) /

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

25

Comparison of the Normal and t Distributions
Critical Values of t and Z for two-tailed  = 0.05
N 3 5 9 31 61 121  df 2 4 8 30 60 120  t value 4.303 2.776 2.306 2.042 2.000 1.980 1.960 Z value 1.96 1.96 1.96 1.96 1.96 1.96 1.96

Critical Values of t and Z for two-tailed  = 0.01
N 3 5 9 31 61 121  df 2 4 8 30 60 120  t value 9.925 4.604 3.355 2.750 2.660 2.617 2.576 Z value 2.58 2.58 2.58 2.58 2.58 2.58 2.58

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

26

Confidence Intervals Using the t Distribution
Example Case processing time 12-1 = 0.905

X = 72 days, S = 3 days, N = 12 S X = (3) /

The 95% confidence interval Find the t value for df = (12 – 1), for a two tailed test at  = 0.05, t = 2.201 95% interval = X t (SX) = 72 2.201 (0.905) 95% interval (70.01 days to 73.99 days)

The 99% confidence interval Find the t value for df = (12 – 1), for a two tailed test at  = 0.01, t = 3.106 99% interval = X t (SX) = 72 3.106 (0.905) 99% interval (69.19 days to 74.81days)

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

27

Confidence Intervals Using the Normal Distribution
Example Case processing time 12-1 = 0.905

X = 72 days, S = 3 days, N = 12 S X = (3) /

The 95% confidence interval (Z = 1.96) 95% interval = X 1.96 (SX) 72 1.96 (0.905) 95% interval (70.23 days to 73.77 days)

The 99% confidence interval (Z = 2.58) 99% interval = X 2.58 (SX) 72 2.58 (0.905) 99% interval (69.67 days to 74.34 days)

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

28

Comparison of Results

Confidence Interval Distribution 95% t 70.01 73.99 99% 69.19 74.81

Normal

70.23

73.77

69.67

74.34

Notice The confidence intervals of the t distribution are wider than those of the normal distribution. This difference adjusts for the fact that as the sample size becomes smaller, there is more sample-to-sample variation in sample means … With the result that the tails of the sampling distribution contain more area, i.e. are heavier.

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

29

Another Example
The following statistics were derived on the length of sentence and ages of a sample of 70 prisoners.
Variable N Mean Standard Deviation 4.9532 4.0428

Sentence Age

70 70

5.9571 22.9429

Q What is the standard error of each mean?
S X = (S) / For sentence S X = (4.9532) / For age S X = (4.0428) / 69 = 0.4869 69 = 0.5963 N-1

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

30 Another Example (cont.)

Q What is the 95% confidence interval for the
mean of each variable? 95% interval = X t.05 (SX) Look up the 2-tailed t value for N-1 degrees of freedom (i.e. 69) for 0.05 level of significance. Unfortunately, the t-table in the text does not report a t-value for 69 df, so we will take the value for 60 df, i.e. t = 2.0 For sentence 95% interval = 5.9571  (2.0) ( 0.5963) 4.7645 For age 95% interval = 22.9429  (2.0) ( 0.4869) 21.969 23.917 7.1497

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

31

SPSS and the Standard Error Of the Mean
The equation for the standard error of the mean is as follows S X = (S) / N-1

The denominator N-1 is used to adjust for the fact that the smaller the sample, the more likely the standard error of the mean will be underestiamted. In SPSS, as the sample becomes larger, say N greater than 30 cases, the denominator changes from N-1 to N. For the previous examples, this difference results in the following standard errors For sentence S X = (4.9532) / For age S X = (4.0428) / 70 = 0.4832 70 = 0.592

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

32

SPSS and the Standar Error Of the Mean (cont.)

Standard errors of the mean with small and large sample equations.

Variable

Equation Small Sample Large Sample 0.5920 0.4832

Sentence Age

0.5963 0.4869

Notice that the large sample equation yields slightly smaller standard errors and will therefore yield smaller 95% confidences.
95% confidence interval with small and large sample estimates of the standard error of the mean.

Variable

Equation Small Sample 4.7645 to 7.1497 21.969 to 23.917 Large Sample 4.7761 to 7.1382 21.9789 to 23.9068

Sentence Age

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

33

Inference About a Sample Proportion
In a survey of 800 people, the proportion favoring the death penalty was 0.6 (60%). What is the 95% confidence interval in generalizing this population to the population? The Population
 = the population proportion



(?)

(?)

The Sample
P = 0.6

P = sample proportion

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University

34

Standard Error of a Proportion
The Central Limit Theorem applies to the sampling distribution of a proportion The standard error of the sampling distribution of a proportion can be estimated from a single sample, in a manner similar to that used with the mean SP = Example P(1 - P) / N

Survey of attitudes towards the death penalty (800)

P = proportion favorable = 0.60 Q = proportion unfavorable = 0.40 Standard error of P SP = 0.60 (1 – 0.60) / 800 = 0.017

95% confidence interval P 1.96 (SP) = 0.60 1.96 (0.017) 95% interval: (0.5667 to 0.6333)

Sampling Theory & Standard Errors: Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University


				
Lingjuan Ma Lingjuan Ma
About