Journal of Dental Research http jdr sagepub

Document Sample
Journal of Dental Research http jdr sagepub Powered By Docstoc
					Journal of Dental Research Significance Level and Confidence Interval
Rosario H. Potter J DENT RES 1994; 73; 494 DOI: 10.1177/00220345940730020101 The online version of this article can be found at:

Published by:

On behalf of:
International and American Associations for Dental Research

Additional services and information for Journal of Dental Research can be found at: Email Alerts: Subscriptions: Reprints: Permissions:

Downloaded from by on January 2, 2010

Guest Editorial

Significance Level and Confidence Interval
Rosario H. Potter, DMD, MSD, MS, Professor Emerita
Graduate students in my class in biostatistics are required to demonstrate that they fully understand the meaning and implications of the much-used phrase "results are significant, P < 0.05." To this end, I am continually dismayed when virtually every new class comes in with similar preconceived but erroneous notions about significance level and confidence interval. The more recent the class and presumably the more exposure to increased usage of statistics in the literature, the greater is the difficulty to correct their set concepts of these two terms, which are related yet have different meanings and different
uses. We often see the following in print: * The confidence level was set at P < 0.05. * Statisticalsignificancewassetat the 95%confidence level
a real difference or effect in whatever we are testing. Statistics is the tool we use to demonstrate that we have found such a difference. Todo that, we have to"disprove"(orchallengeorcontradict) the so-called null hypothesis of no difference. In other words, we have to show evidence that our results are not compatible with the null hypothesis, which assumes no difference. The test statistic (t, F, x2, etc.) calculated from our data will lead us to two, and only two, possible conclusions: that our data either (1) significantly deviate from zero or no difference, i.e., we reject the null

hypothesisandconcludeadifference;or(2)donotdeviatesignificantlyfromthenullhypothesisof nodifference,i.e.,nodifference can be found. In this manner, we have performed a statistical test of hypothesis, and our decision, differ or do not differ, is based on pre-determined cut-off points in the percentage or probability distribution of the test statistic that we use. Notice that this isa yes or no decision, either significant or not significant. Cut-off points are referred to as the critical values of the test statistics. Significance at 0.05 level means that: (1) our data are sufficiently far from the cut-offs for no difference, to lead us to a conclusion of difference; and (2) in this decision, we are aware that we incur a probability (P) or chance of 5% of being wrong, because we know that 5% of similarly conducted experiments will showastatistical significancejustbychancealone,evenif no real difference or effect exists. The smaller the P level (0.01, 0.001, 0.0001) and the farther our data from the cut-off, the less chance wehavefor a wrongconclusion;and thusthemorecertainwecan be of the difference that we find from our data. This percentage (0.05, 0.01, etc.) is the level of significance, or the P level, or a, which is the probability of making an error in concluding a difference when none really exists. By this definition, significance level of 0.95 or 95% is meaningless. In our test of hypothesis, notice that (1) we, the experimenters, determine the significance level or cut-off (0.05, etc.); (2) the cut-off is arbitrary; (3) the P level for concluding a difference may be very small but never absolute zero because, in scientific research, certainty is never absolute, only relative; and (4) the significance statement isa rather strong statement, because we are saying in essence that we know our magnitude of error when we conclude a difference.

(P < 0.05).
* Significance between groups was determined by using P < 0.05 confidence limits. * The difference shows statistical significance at probability of 95%. * The difference was not significant at the 95% confidence interval. * The groups show similar mean values at 95% confidence. These expressions are not unique to dental publications, but are also routinely found in the medical literature. It is not surprising that new and non-statistician researchers are easily misled toward at least three erroneous concepts: first, that significance level and confidence level are synonymous and interchangeable terms, one being the "other side" of the other; second, that the term "not significant" shows that the mean values are the same; and third, that if our results are significant at P < 0.05, we may be confident at 95% probability that this difference or effect we found is real. I believe we have a responsibility to guide our future scientists in concepts fundamental to research. Therefore, this editorial takes advantage of the excellent forum offered by the Journal to clarify such commonly misused and misleading terms in the literature. In the attempt, plain words will be used as much as possible without invoking statistical jargon. A priority goal in scientific research, simply stated, is to show

Downloaded from by on January 2, 2010

J Dent Res 73(2)1994

Guest Editorial


Graduate students have no problem with the concept that P < 0.05 measures the chance of error in concluding a difference. The problem comes when they mistakenly stretch this to include the notion that if 0.05 measures error, the other side, the 95%, must be the confidence level and must mean 95% probability of a real difference. This is not the case. These two probabilities do not add up to unity. If they do, we should theoretically be able to set a significance level of zero and be 100% certain that this difference is real. The use of the word "confidence" is unfortunate. It is more easily understood per se than the word "significance", so much so that authors without statistical background seem to be more comfortable using it than the latter, even to the point of misuse. What, then, is the confidence interval? All of us have used the mean in our data as a one-value estimate of an unknown real mean value, a parameter which we seek. In statistics, this is the point estimate. Likewise, mean difference, correlation coefficient r, regression coefficient b, odds ratio, percent reduction, etc., are point estimates. Not all researchers, however, are aware that there is another estimate via a range of values, i.e., the confidence interval estimate. Statistics textbooks tell us that this range has a designated likelihood (usually 95% or 99%) to include the real but unknown parameter that we seek. In other words, we may be 95% (or 99%) confident that the values between the two confidence limits calculatedfrom our sample data may include the unknown real mean value, mean difference, r, b, etc. Let us now examine how this definition of the confidence interval relates to, and differs from, the level of significance. Primarily, confidence interval is a method of estimation, while significance level is used with the statistical test of hypothesis to arrive at a conclusion. The twoprocedures,however,can be used togethervery effectively to give more informative results than either alone. Consider the case where we conclude a difference at P < 0.05. The chance of this conclusion being wrong is less than 5%. Here, our observed mean difference is large enough to be beyond the cut-off from zero or no mean difference, in the reference distribution upon which the null hypothesis is based. On the other hand, the definition of the 95% confidence interval tells us that clustering around the mean difference in our data is a range of values that has a 95% chance of including the real mean difference. These values will not include zero, which is consistent with the significant results in the test of hypothesis. It is important to note that the 95% confidence interval is calculated around the sample mean difference in ourdata, and not around the zero mean difference in the reference distribution in our test of hypothesis. This is probably the underlying notion that leads to misconstruing 95% as the probability of correctly concluding a real difference. In this case, where our data show a significant dif ference, we may choose to publish confidence intervals for the mean treatment effect and for the control mean. The more highly significant our result is (P < 0.01, P < 0.001, etc.), the farther these two intervals will be from each other, and the values will not overlap. Note that these two intervals are calculated around the

treatment mean and the control mean in our data, while the interval for mean difference mentioned above is calculated around the difference between treatment and control means. The confidence interval for mean difference gives the most helpful information in the alternative case where P > 0.05. Here, the chance of error in claiming a difference is large and greater than 5%, thus leading us to a conclusion of no significance. The 95% confidence interval calculated around the mean difference in our data will include zero and a range of values that will lead us to conclude no significance. Why is this interval inf ormative? Because the confidence limits give a whole range of differences that may be clinically or scientifically meaningful although not statistically significant. In other words, even if analysis of our data does not result in finding a statistically significant difference or effect, there may exist a real diff erence or effect that may not be large but is meaningful to the researcher. This point must be emphasized particularly when our data are of near-borderline significance, say somewhere at P = 0.06. Such results occur because: (1) sample size is small due to expense and time constraints; and/or (2) variation (standard deviation) is large due to difficulties in obtaining quantitative, reliable, and repeatable data, to instrumentation and technique errors, to inherently large variation between subjects, etc. In this case, the two confidence intervals, for treatment effect and for control mean, can be expected to overlap. The reasoning then follows that, in publications, the statement "P > 0.05, not significant" is not as informative as specifying the actual P levels obtained and showing the confidence limits for the mean difference. Levels at 0.06 and 0.50 are both greater than 0.05, but have very different implications. At 0.50 level, a conclusion of no difference is obvious. But if P level is of near-borderline significance, we can say that: (1) our observed difference or effect, although not significant, is clinically or scientifically meaningful; (2) within the confidence limits is a range of differences of which we can be 95% confident that these values may include the real difference; and (3) if sample size were increased and/or variation decreased in follow-up studies, there is a strong likelihood that the difference would reach statistical significance. In fact, such preliminary data are of ten required by f unding agencies like NIH to estimate needed sample size to ensure statistical significance in future experiments. It is hoped that this discourse clarifies the following: First, significance level and confidence level are not synonymous or interchangeable terms. The amount 0.05 is used with the significance level and 95% with the confidence level. This is because 0.05 refers to a probability given our particular sample data, whereas 95% is not probability in this sense but refers to the percentage of intervals that are expected to include the real difference among which our interval may or may not even be one. A statement of significance is associated with the test of hypothesis against a reference distribution with a hypothesized mean, as, for example, against the null hypothesis where the hypothesized mean is zero difference. A statement of conf idence refers to an interval of values calculated from sample

Downloaded from by on January 2, 2010



J Dent Res 73(2) 1994

data, as, f or example, 95% confidence interval around the mean difference in our data, and no hypothesis testing is involved. The two terms thus have dif ferent, albeit related, meanings and uses. We can make a statement of significance at the 0.05 level, and we can make a statement of confidence at the 95% level. We are, however, making two different statements and not two versions of one and the same statement. Second, "no statistical significance" does not 'necessarily imply that the group means are the same. What can be inferred is that either the difference is so small that there is no practical difference between the group means, or the observed difference is not large enough to reach the 0.05 cut-off for significance due to small sample size or large variance but is nonetheless meaningful to the researcher. Third, concluding a significant difference at P < 0.05 does not mean that we can be confident at 95% probability that this difference or effect we found is real. The test of hypothesis yields a yes or no conclusion. In concluding a difference, we can make a definite statement of significance that P < 0.05, which says that we know we incur a 5% chance of error (or 1% if P < 0.01, etc.). If our observed difference falls in the other side, i.e., the 95% region of the distribution, our conclusion is necessarily that of no difference. Therefore, 95% or 0.95 probability means that, if there indeed is no real difference or effect, we will not get a mean difference in our data large enough for significance, 95 times out of 100. Note that this statement is entirely different from the incorrect statement that 95% measures probability of a real difference or effect. The reasoning then follows that 95% has meaning not in the "yes" conclusion but in the "no" conclusion, where no significant difference is found. Again, this 95% or 0.95 (1-0.05) probability is not a confidence interval. The final question is thus: If significance level tells us our chance of a wrong conclusion, what measures the probability that we are correct when we conclude a difference, that this is a real difference or effect? The answer is, the power of the statistical test. Power of the test depends not only on the designated signifi-

cance level (0.05, 0.01, etc.) but also, more importantly, on sample size, amount of variation, and the magnitude of mean difference. Therefore, it is calculable only when these values are available from preliminary data. Power is a required statistic for grant proposals because it shows the likelihood that any detected difference or effect is real and not a chance finding. Power, although not usually shown in published papers, can be expected to be high if sample size is sufficient, if the difference between group means is large, and if data measurement is valid and repeatable using reliable technique and instrumentation. The readers should now be ready to edit the inappropriate statements cited previously, as follows: * The significance level was set at P < 0.05. * Statistical significance wasset at the 0.05 probability level. * Significance between groups was determined by using P < 0.05. * The difference shows statistical significance at probability of 0.05. * The difference was not significant at the 0.05 probability level. * The groups do not show a difference at the 0.05 significance level. All are statements of significance in conclusions drawn from tests of hypothesis. They are not confidence statements. Our responsibility to the next generation of dental researchers dictates a more thoughtful approach to our written words, the effects of which are frequently underestimated in the rush to publish. No citations are referenced, since it is not the intent of this edinrial to finger-point, but rather to alert authors, editors, and reviewers to avoid misleading statistical statements in published papers.

-Rosario H. Potter Department of Oral Facial Development Indiana University School of Dentistry Indianapolis, IN 46202-5186

Downloaded from by on January 2, 2010

Shared By:
Description: Journal of Dental Research