VIEWS: 7 PAGES: 14 POSTED ON: 7/12/2011
A SPREADSHEET APPROACH TO TEACHING RESAMPLING (Journal of Education for Business July/August, 1998) (Does not include figures) Richard L. Morris Winthrop University Rock Hill, SC 29733 803-323-2186 Barbara A. Price School of Business Georgia Southern University Statesboro, GA A SPREADSHEET APPROACH TO TEACHING RESAMPLING ABSTRACT This article presents the use of a spreadsheet-based package to teach resampling. A brief description of resampling is given, followed by a discussion of the package. Then two problems are used to illustrate resampling. These problems are initially solved using classical parametric methods, and some of the advantages and disadvantages of this approach are offered. This is followed by a discussion of how resampling could be applied to the problems. Finally, the resampling method is expanded by incorporating Lotus into the process to simulate enough replications to provide for reasonable conclusions. INTRODUCTION Anyone who has ever taught statistics has observed that most undergraduate students can learn the steps involved in statistical inference but few of them really understand what they do. For that matter, how many of us really understood what we were studying when we first took statistics? The difficulty is that in presenting what essentially are fairly simple concepts, a complicated set of theory and mathematical calculations has to be learned also. One solution to this problem is the use of resampling techniques to assist in explaining many of the concepts of hypothesis testing. Resampling is simple to understand, it makes no assumptions on the underlying distribution, and involves no complicated mathematical formulas. The basic concept is straightforward. Rather than assuming an underlying distribution and making inferences based on formulas, a sample space is constructed from the original sample (or samples) and then samples are drawn repeatedly from this "population." Inferences are drawn based on the resulting samples. It is computationally very expensive, of course, but most practical problems are well within the capabilities of modern PCS. The authors have begun to write a series of macros to perform resampling in Lotus. It is hoped that the availability of such a package will encourage more teachers to incorporate resampling in their courses. Most students are already familiar with spreadsheets and might find such an approach easier than learning a specially written resampling package. In addition, students can examine the macros and associated formulas to gain more of an understanding of what occurs. If they want to modify output, such as resizing graphs or adding explanations, it is easily done. Furthermore, the macros themselves may be modified by skilled Lotus users to treat new situations. RESAMPLING Julian L. Simon, Professor of Business Administration at the University of Maryland, can be considered the father of the use of resampling as a method to teach statistics. His definition of resampling is as follows: Let's define "resampling." A statistical procedure models a physical process. A resampling method simulates the model with easy-to-manipulate symbols. Either the observed data (in the case of a problem in statistical inference), or a data- generating mechanism such as a die (if it is a problem in probability), is used to produce new hypothetical samples, the properties of which can then be examined. The resampler postulates a universe and then examines how it behavesin the case of statistics, comparing the outcomes to the criterion we choose. (Simon, 1994) One of the virtues of resampling is that we do not need to assume that the populations are normally distributed. 2 WHY USE A SPREADSHEET FOR RESAMPLING? Before describing the spreadsheet package, it might be worthwhile to discuss some of the pros and cons of resampling with a spreadsheet. Speaking of the cons first, there is at least one resampling package already developed (Rosa-Hatko, 1995). The authors have not used it, but it is almost certainly much faster and more powerful than any spreadsheet-based package could be. However, most students are familiar with at least one of the commonly used spreadsheets, so that having access to a spreadsheet-based package might be the key to initiating them to resampling. There are other advantages also. With the macros to be discussed here, there is no need to type in a series of commands. Everything is windows-based and actuated by macro buttons. If the user wishes to modify some of the output for presentation purposes, it is very easy to do so. Since the macros are in the spreadsheet itself, the user can examine them to see what they do. Lastly, and this might be the most important consideration, users competent with macro commands may easily modify the macros to fit their individual applications. For example, if one wishes to resample using the median rather than the mean as a measure of central tendency, all he or she has to do is modify one of the formulas in the macros. ORGANIZATION OF THE RESAMPLING SPREADSHEET 3 The resampling package relies on a series of macros in Lotus 4+. The name of the spreadsheet containing the macros is RESAMPLE.WK4. When the user opens the spreadsheet, an autoexec macro automatically goes to a control section, from which the user can go to any of the available applications. Each application is in a separate worksheet within the spreadsheet. In addition to the application worksheets, there is a separate worksheet for the results, another containing the macros themselves, and a README worksheet with instructions on using the macros. The macros are designed to be as user-friendly as possible and to that end utilize many Lotus features in the macros. For example, macro buttons are provided to navigate to the various application worksheets and to invoke macros. When the spreadsheet opens up and the user is in the control section, shown in Figure 1, he or she uses one of the buttons to go to whichever application is desired. At this stage of development, the spreadsheet contains applications for one mean, two means, paired differences, one variance and one proportion. Each of these has an associated macro button in the control section. A README section, shown in Figure 2, lists the instructions necessary to operate the spreadsheet. In the instructional package developed, only the bootstrap procedure has been incorporated. The sampling environment is assumed to be an infinite population, and sampling is with replacement. Also, the available applications are limited to inferences for a single mean, a single proportion, a single variance, two means from independent populations, and the mean for paired differences. In the following sections two examples are used to explain spreadsheet-based resampling. EXAMPLE 1 4 Consider the following data from the article "Pick a Sample," by Peterson (1991). Battery Lifetime (weeks) Mean Std. Dev. With additive: 33 32 31 28 31 29 29 24 30 28 29.5 2.550 Without additive: 28 28 32 31 24 23 31 27 27 31 28.2 3.084 The first row of observations lists the lives of 10 automobile batteries with an additive designed to lengthen their useful life. The second row shows the lives of 10 batteries that did not receive the additive. To verify the effectiveness of the additive, we may conduct a pooled t-test as follows: H 0 : 1 2 H 1 : 1 > 2 where: 1 = the mean lifetime of the batteries with the additive 2 = the mean lifetime of the batteries without the additive In order to conduct the pooled t-test, we need to make the following assumptions: The samples are independently drawn. The variances of the two populations are equal, i.e. 21 = 2 2 , even though the value is unknown. The populations are normally distributed. Furthermore, when H 0 is true the means of the two populations are equal. TEST STATISTIC: 5 ( X1- X 2 )-0 t n1+ n2- 2 = 2 2 S p S p + n1 n2 DECISION RULE: If α = .05, reject H 0 if t n1+n2- 2 > 1.734 CALCULATIONS: ( 29.5 - 28.2 ) - 0 t 18 = = 1.027 8.003 8.003 + 10 10 CONCLUSION: Fail to reject H0; cannot conclude that the additive lengthened the useful life of the batteries. (p-value .159) Before actually performing the resampling experiment, it might be informative to describe how resampling would be done without a computer, perhaps as a way of introducing it in the classroom. First, treat the two sets of observations as if they comprise one population in which there is no difference between the batteries with the additive and those without it. Then take repeated random samples, two at a time, of 10 observations each. This could be accomplished by putting the 20 observations into a hat and sampling. Since our samples can be considered as coming from infinite-sized populations, we should sample with replacement, by replacing each observation as drawn and re-shuffling. Calculate the mean of each sample and compute the difference. Repeat this process a sufficient number of times to arrive at a meaningful sampling distribution. This distribution reflects the behavior of the sample means if there is no significant difference between the two original sets of batteries. By examining the sampling distribution, it can then be determined 6 whether the difference between the actual sample means is great enough to be significant. This is a good exercise to conduct as an introduction to resampling. It involves learning by doing, it is an exact simulation of how resampling is actually done, and it soon becomes clear why a computer is necessary to perform the calculations. RESAMPLING USING THE SPREADSHEET PACKAGE To conduct the test with resampling, the user should navigate to the data entry section for two means and enter the data as shown in Figure 3. The number of resamples is specified at 1,000, the null hypothesis at 0, and the confidence level at .95. Since this is an upper-tailed example, U is entered for the type of test. The resampling is done using the Lotus @VLOOKUP command, which in effect samples with replacement, or equivalently, from an infinite or very large population. This is appropriate for this example because the population can be considered as the very large number of batteries that might or not be treated with the additive. After the input is complete, the user clicks the RESAMPLE macro button, invoking the macros necessary for the two means test. When resampling is finished, the spreadsheet ends up in the RESULTS worksheet, illustrated in Figure 4. Shown in the initial screen of this section are the p-values and confidence intervals, both from resampling and from parametric methods. As can be seen, there is very close agreement between the results. It has been the authors' experience that the results are usually very close for large samples, but not as close for smaller samples. This is consistent with the literature to date on resampling. See, for example, Seppala, Moskowitz, Plante, and Tang (1995), in which the results of resampling and standard methods were compared for the calculation of control charts. One of the conclusions of this reference is 7 that resampling and classical methods are in very good agreement when the underlying distribution is normal. However, when sampling from a skewed population (they used the exponential as an example) the resulting bias (the difference between the resampling or parametric results and "true" results) is less with resampling. There are two more sections of interest in the RESULTS worksheet. The first is the graph of the resamples, shown in Figure 5, obtained by scrolling down one screen. As expected, since the samples are combined before resampling, the mean of this histogram is approximately zero. The sample mean difference of 1.3 lies in the upper tail of the distribution consistent with its p-value. Scrolling to the right from the original RESULTS screen, the user may view the resamples, some of which are shown in Figure 6. From a pedagogical standpoint, the most important points to dwell on are the resamples themselves and the graphs. The instructor should point out the formulas used in the resampled observations to show the equivalence between them and resampling by hand. The graphs show intuitively how likely it is to obtain the actual sample difference from the population constructed from the original sample. The preceding problem required a very modest amount of computer time. It took 6 seconds to run the complete resampling procedure for 1,000 resamples. The REPEAT procedure took only 3 seconds to perform 1,000 resamples. REPEAT takes less time because it can take advantage of a certain amount of setup work already done by RESAMPLE. For 200 resamples, RESAMPLE took 4 seconds and REPEAT required 2 seconds. All of these times were recorded on a Pentium 166. EXAMPLE 2 8 The second example makes use of data collected as part of a project in one of the authors' graduate statistics classes. The project involved a water processing plant located in Bucksport, South Carolina, and operated by the Grand Strand Water & Sewer Authority (Burgoyne, Eliott, King, and Carroll, 1995). At the time, the treatment plant was measuring pH levels of processed water both manually, using pH indicator paper, and automatically with a computerized pH meter. This meter operated by means of an electrode suspended in the water and automatically took readings every five minutes. Operating personnel at the plant used both methods because they were not as confident of the accuracy of the automated procedure as they were of the manual method. However, they were considering eliminating the manual method if they could be assured of the automated method's accuracy. Accordingly, the students doing the project took a series of 177 simultaneous pH readings using both the manual and the automated methods. Assuming that the manual method is reliable, its readings should provide a good benchmark for the automated process so that if there is no difference, on average, between the two, it can be concluded that the automated method is as reliable as the manual one. This is a classic paired Ho : d = 0 H1: d 0 difference experiment in which the following set of hypotheses hold: where d is the mean paired difference between the manual and automated readings. If the assumption of no difference is rejected, then we can conclude that the readings are different, on average, between the manual and automated methods. If the manual method is considered 9 accurate, a significant difference between the two could be blamed on the unreliability of the automated method, and they should keep both. To analyze this problem, the paired difference worksheet was used. Again, it is appropriate to think of this as sampling from an infinite population, since the readings will continue virtually forever. The input appears in Figure 7, although only some of the 177 observations are shown. The results of 1,000 resamples achieved a resampled p-value of .30364. This agrees closely with the theoretical value of .31262. Since neither one would cause us to reject the null for any reasonable significance level, we may conclude that the automated pH method is consistent enough with the manual one to justify using the automated method exclusively. This second example, was of course, much larger than the first one and consequently took longer to run. It took 48 seconds to run the RESAMPLE macro for 1,000 resamples. REPEAT took 38 seconds for 1,000 resamples. When the number of resamples was decreased to 200, RESAMPLE took 24 seconds while REPEAT required 15 seconds. Times on a 486 computer were much slower, but manageable for smaller problems. PERFORMANCE In the authors' experience so far, there is very close agreement between resample and theoretical results for large samples. This is reasonable, since, according to the Central Limit Theorem, the sampling distribution of the mean approaches normality with larger sample sizes. For smaller problems however, there is less agreement. The authors suspect that this disagreement is caused by skewness, as suggested by Seppala et al. (1995) 10 As a test, the authors simulated 1,000 resamples for 1,000 trials (a total of 1,000,000 resamples) using the data from Example 1. The mean p-value for the 1,000 trials was .1548, compared to the theoretical p-value of .1589, so agreement is very close on average. The mean absolute deviation for the 1,000 trials was .0093, or somewhat less than 1%. The mean lower confidence limit for the 1,000 trials was 1.110, compared to the theoretical value of 1.358. The mean upper confidence limit was 3.712 compared to a theoretical value of 3.958. The mean absolute deviations of the LCL and UCL, respectively, were .0842 and .0839. CONCLUSIONS AND PLANS FOR FUTURE DEVELOPMENT At this time, the package includes macros that perform the same operations presented in this paper for cases involving one mean, two means, paired differences, one variance and one proportion. We plan to extend the macros in the future to handle perhaps two or three additional common situations, such as regression and correlation analysis. Consideration is also being given to offering the finite population, sampling without replacement or randomization sampling environment alternative. The authors would appreciate very much any suggestions as to what features to add to the package. A copy of the package is available on request from the first author. 11 REFERENCES Burgoyne, S., Eliott, B., King, R., and Carroll, M. (1995). pH analysis for surface water treatment plant, Conway & South Pump Stations. Unpublished class project. Peterson, I. (1991). Pick a sample. Science News, 140, 56-58. Ricketts, C. and Berry, J. (1994). Teaching statistics through resampling. Teaching Statistics, 16 (2), 41-44. Rosa-Hatko, W. (1995). Resampling stats. ORMS Today, 22 (2), 72-74. Seppala, T., Moskowitz, H., Plante, R., & Tang, J. (1995). Statistical process control via the subgroup bootstrap. Journal of Quality Technology, 27, 139-153. Simon, J. L. (1993). Resampling: The New Statistics. Belmont, CA: Duxbury. Simon, J. L. (1994). The resampling method for statistical inference. Basics chapter of Philosophy document of Internet manuscript. Things to do: Either get last reference or delete it. Is first Simon reference to book or internet document? Look at one of the others. Would look like this: Simon, J. L. (1993). Resampling: The New Statistics (pp. 345-346). Duxbury Check to see if publications cited need issue number (they don't if pages run throughout the volume.) Where is Duxbury located? Is # of words OK? Remember figures 13