Chapter 12
Sample Size Determination
The appropriate sample size for any experiment is a function (influence) of many issues
which describe your experiment. For example, the appropriate sample size is affected by the
number of levels of the independent variable in your experiment, as well as the level of
confidence you employ, the tolerance set for determination of a significant effect, and the type of
data (dependent variable) in your experiment. To help standardize the material which follows, I
will employ a constant level of confidence of .05. In short, the probability level needed for
rejection of the null-hypothesis will be held constant. The material has been arranged into three
sections to correspond to the dependent variable types of
nominal, ordinal and interval data, respectively.
Sample Sizes for Nominal Data
From the equation for the Chi-square we have:
^ 2 ={sum{(O-E)}^2} over E In this equation O represents the observed
presence of a characteristic out of N subjects or kN where k indicates the proportion of presence
and N is the sample size. Assume that we wanted to have a tolerance (T) to achieve a
difference, therefore, E = kN+TN. Further, suppose that we wanted to achieve this difference
for each cell of the O2 observation matrix. Therefore, df =1, O2=3.84, and our equation
becomes:
3.84 ={{(KN-(NK+TN))}^2} over or
3.84={T^2 N} over {K+T}
{KN+TN} which gives through algebraic manipulation:
{3.84{(k+T)}} over {T ^2} =N Under the assumptions of the null-hypothesis k
= 0 and our equation becomes:
{3.84} over T =N Therefore, if you wanted to find a 30%
differences as statistically significant then the recommended sample size would be (after
rounding up to the nearest whole number):
{3.84} over {.3} =12.8=13 Remember, this number represents the
sample size for each cell of the observation matrix. For example, suppose that we had a study
of three companies to see if the ethnic distributions (defined as white, blacks, Hispanics, and
others) differed significantly. Can you see that we have 12 groups in the study? At a 30%
tolerance we would need 156 subjects (3 companies by 4 ethnic classifications = 12 groups, with
a minimum of 13 subjects in each group or (12)(13)=156 subjects).
Sample Sizes for Ordinal Data
The formula for determining the
Z={p sub 1-p sub 2} over sqrt {pqdifference between two proportions1 is as
over N} follows:
1
You may want to see Chapter 10 of the Volume I of this series.
In this formula, p1 and p2 indicate the proportions for two groups and p is given by the
following formula and q = 1- p:
p={f sub 1+f sub 2} over {N sub 1If we assume that our groups will have equal
+N sub 2} sample sizes, square both sides of the equation
above, and performs some algebraic
{pq Z ^2} over{ ({p sub 1-p submanipulation we can derive:
2 })}^2= N At this point we can make a conservative
assumption, which have the effect of inflating our sample size, that p = .5 and q = .5. Any other
value of p and q will produce a smaller estimate of N. It is best to error on in the direction of
increasing rather than decreasing our estimated sample size! Z is a constant of 1.96 and the
difference between p1 and p2 can be seen as a
N={0.96} over {T ^ 2} tolerance (T). Therefore, we have:
As an example of the application of the formula, suppose that we had four groups in study and
we wanted to find a 10% difference as statistically significant then we would need a individual
group size of:
N={0.96} over {{0.10} ^ 2}={0.96}or a total sample size of 288 subjects (3 groups
over {0.01}=96 by 96 subjects = 288 subjects).
Sample Sizes for Interval Data
Remember, an optional formula for the t-test for difference with interval data was as
t={overline X - µ} over sqrt{S ^2 over N}
follows:
t sqrt{S ^ 2 over N}over {overline X sub 1- µ} =1
If we modify the equation algebraically, we can derive the following:
If we let the difference between means define a tolerance “T” (or a difference needed for
significant results) then we have after squaring both sizes of the equation and a little algebraic
{S ^ 2 t ^ 2 }over T ^ 2 =N
manipulation:
Now, let us assume that our score for each individual was tabulated as a Z-score or
(Score-mean)/S then the standard deviation in our equation becomes one (1) and the formula
reads as follows:
{t ^ 2 }over T ^ 2 =N
If you examine the t-test table at the .05 level of confidence with any reasonable estimated
sample size, say more than 20, you will see that the t-values stabilize at about 2.00. Therefore,
4 over T ^ 2 =N
let us assume a t of 2.00 as our estimate then our formula reads as follows:
Tolerance, considering that our observations were recorded as Z-scores, can now be seen as the
proportion of a standard deviation needed to obtain a significant effect (if a significant effect is
indeed present). For example, suppose that we wanted to estimate the needed sample size for an
experiment in which we would like to find a 40% difference between our means as statistically
significant then we would need a sample of:
4 over {.4 ^ 2}=4 over {.16}=25 This sample size would indicate the
recommended sample size for each of the
groups in the experiment. Suppose that in our
experiment there were three groups then we would need 75 subjects, 25 for each group. You do
have some variability in the exact sample size. You might have 20 subjects in one group and 30
in another. At the end of this chapter is a table which summarizes the recommended sample
sizes for various tolerance levels.
Sample size chart
To save you some time many of the recommended sample sizes have been prepared in
the table below. These numbers represent the recommended sample sizes to determine
significance (at the .05 level of confidence) for experiments with varying tolerances. Tolerance,
again, means the amount of variability you would like to find as statistically significant (I
recommend at least 30%).
Dependent Variable Data Types
Tolerance Nominal Ordinal Interval
` Small 50% - - 11
For example, suppose that you intend to have three groups in your experiments, the dependent
variable is interval type of information, and you would like to find a 40% difference as
statistically significant. You would need 75 subjects in your study (25 for each of the three
groups).
Final Comments On Sample Size
This presentation avoided many issues which relate to sample size, like the chance of
type II error and the power of the statistical method. Perhaps the largest issue not considered in
this presentation was the political issues revolving around your advisory committee (if you are
writing a dissertation/thesis). Often you will find that the needed sample size, which determined
from some rationale process (like describe in this chapter), is not sufficient in your advisor‟s
opinions. In this situation you can cast rational explanations of sample size „to the wind.‟ If
you encounter this situation then try doubling the sample size. If that do fails to work, you may
consider calling for some emotional advice, my phone number is (619) 744-1774 and my E-mail
address on the INTERNET is Weedman@Elfin.con.