Document Sample

ST2238 Introductory Biostatistics Lab 1 Lab 1: Introduction to Minitab, paired & two-sample t-tests Q1: Eﬀect of hypnosis in reducing pain Hypnosis is sometimes used as a non-medicinal form of pain relief. Price and Barbour1 recruited students to a study in which they were exposed to painfully high temperatures before and after hypnosis in order to assess whether hypnosis has any eﬀect on their ability to withstand pain. Participants indicated their (subjective) amount of pain on a 15cm strip. These pre- and post-hypnosis measurements for eight participants are reproduced below. Subject 1 2 3 4 5 6 7 8 Before 6.6 6.5 9.0 10.3 11.3 8.1 6.3 11.6 After 6.8 2.4 7.4 8.5 8.1 6.1 3.4 2.0 (a) Assess the hypothesis that the standard deviations are the same in the “population” of na¨ and ıve experienced bees. (b) Perform a two-sample t-test of the hypothesis that bees do not learn with experience to identify iridescence. (Note, the authors took pains to ensure it was colour alone that was guiding the bees.) (c) Can you think of a better test for analysing the results of the experiment than a two-sample t-test? Q3: Do food resources decrease with distance from barrier reefs? Shima4 was interested in the habitat of the six bar wrasse (Thalassoma hardwicke Bennett 1830), a marine reef ﬁsh that inhabits parts of French Polynesia. He measured the settler density, deﬁned as the number of juvenile ﬁsh per “unit” of settlement habitat, at a series of sites at two distances (250m, 800m) from the reef crests, in order to assess whether the resources upon which the wrasse depends decrease with distance from the reef crest. The data can be downloaded from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/thardwicke.txt (a) Do the data look normal? (b) Are the variances suﬃciently similar that a twosample t-test can be performed? (c) If so, perform a two-sample t-test of the hypothesis that the mean settler density is the same at the two distances. (d) Again, if the variances are similar enough, construct a conﬁdence interval for the true diﬀerence between mean settler densities. Hints: Minitab is fairly intuitive, especially if you’ve ever used a spreadsheet in the past. Functions can be called by going to the menu along the top. It is also possible to use a command line to call functions. This is very useful when you are doing some repetitive analyses, as you can copy and paste commands in without having to go through each menu. To activate the command line, go to Editor → Enable commands. Note that this also allows you to see the commands equivalent to the functions you call using the menus. Minitab has a few sub-windows. One is the Worksheet. Your data go here, with one datum per cell. Ensure the ﬁrst element of the data is in row 1 (unlike excel, Minitab is fussy about this). Columns can be referred to by their column number (e.g. C1) or by a name if you assign one (this goes between the “C1” row and row 1). Another sub-window is the Session window. This will contain the output of some functions you run. It will also be where the command prompt will appear if you use commands. There is also a Project Manager Window (which I don’t know how to use) and graphs will appear in their own windows. A picture of these windows can be seen overleaf. • To start Minitab, Windows icon → All programs → Minitab solutions → Minitab 4 (a) Why is a paired t-test more appropriate than a two-sample t-test? (b) Do the data look normally distributed? (c) Perform a paired t-test to assess the hypothesis that hypnosis has no eﬀect. State your conclusions carefully. (d) What is a 95% conﬁdence interval for the true diﬀerence in mean subjective pain score before and after hypnosis?2 (e) What population does this sample represent? (f) Why are the conclusions of this experiment dubious? Q2: Ability of insects to detect ﬂoral iridescence Iridescence is the change in colour of a surface when viewed from diﬀerent angles. Whitney et al.3 performed a laboratory experiment to assess whether ﬂoral iridescence is used to attract pollinating animals, by making artiﬁcial ﬂowers out of CDs that were either iridescent or not, placing sweet substances on the iridescent tulip-castings and bitter on the non-iridescent ones, and recording how frequently laboratory bees identiﬁed the iridescent cast. Whitney et al. report that as the ten bees gained experience, they were more likely to visit the iridescent casting. Speciﬁcally, they state that of the ﬁrst 10 visits per bee, the mean and standard deviation of the number of visits to the iridescent casting were 4.7 and 1.6, resp.; these were 8.1 and 1.3 on the last 10 (of 80) visits. Assume the number of visits in each of these groups of ten are approximately normally distributed. Price and Barbour, 1987, J. Abnormal Psychol. 96:46–51. This is actually a stupid question included just for practice. 3 Whitney, Kolle, Andrew, Chittka, Steiner and Glover, 2009, Science 323:130–3. 2 1 Shima (2001), Ecology 82:2190–9. Created by Alex Cook, National University of Singapore Page 1 ST2238 Introductory Biostatistics Lab 1 15 Statistical Software English • To import data from a text ﬁle, File → Other files → Import Special Test. Choose the columns to store the data in (this requires knowing how many columns are in the data), click ok and then browse for the ﬁle. If the data have headings, Format allows lines to be skipped (i.e. you can skip the top line containing the heading). • To evaluate normality graphically, you can do a histogram or what Minitab calls a probability plot. • To do a histogram, Graph → Histogram, click simple, and in the next pop-up, enter the columns to plot histograms of in Graph variables. You might like to display multiple histograms on the same graph in diﬀerent panels with the same scale (on the x-axis) to get a feel for how the distributions vary. This can be eﬀected by playing with the options when you specify the columns for the plot. • To do a probability plot, Graph → Probability Plots, then choose single. The plot it makes will have a straight line superimposed upon it. The points should lie close to the line if the data are normally distributed. Minitab has also superimposed some 95% conﬁdence intervals (really, it’s a [pointwise] conﬁdence region): 95% of the points should fall within this region. If too many fall outwith the region, this is suggestive of nonnormality. • Graphs can be saved to be imported into reports. To do this, File → Save graph as. There is a Mintab format, which no other program is likely to be able to open. Save it in this format if you wish to edit it in the future inside Minitab. Otherwise, save as a jpeg or png. Unfortunately, Minitab like much other windows software does not give the option of saving as lossless vector graphics, such as postscript or pdf. • Graphs can be edited by double clicking on various parts of them. Personally, I hate the cream background Minitab creates and the bold fonts for axes labels. (If you want really good graphs, I recommend mastering R—but this involves a considerable investment in time.) • A paired t-test can be done via Stat → Basic Statistics → Paired t. • A two-sample t-test can be done via Stat → Basic Statistics → 2-sample t. Note: if you have raw data in two columns, use the use samples in different columns option. There is also an option to perform the test based on summary statistics, i.e. the means, standard deviations and sample sizes. Note: I recommend ticking the assume equal variances box. If you do not, Minitab will do the version of the test that does not assume equal variances, and the interpretation will not be clear cut. • An f-test of the equality of variances can be done via Stat → Basic Statistics → 2 Variances Created by Alex Cook, National University of Singapore • You can create summary statistics of your data using Stat → Display Descriptive Statistics. Ask me if any of the meaning of any of these isn’t clear. Often students feel very attached to these and lovingly reproduce them in reports. I advise you are very selective in reporting summary statistics. • You can save either the whole project or just an individual worksheet. To save the project, File → Save Project. You might like to save each week’s work as a separate project, perhaps with a new worksheet per question. • A very useful tool in Minitab is its calculator. To access this Calc → calculator. This allows you to manipulate the entries of one or more columns and store them in another column or constant. [Constants are called “K1”, etc., and may be viewed using the command line via print k1.] This is a bit like excel’s functions, except you don’t have to mess about with dragging and clicking. An alternative software package you might consider is R. It is installed on the computers in the lab. It is free and thus you can download it at home without worrying about legal issues or cost. However, it is primarily based upon command lines. This makes it a bit hard to get to grips with initially, but it is simply the most powerful statistics package out there, so if you end up doing lots of analyses as part of your research, it might be worth spending time mastering it. Page 2 ST2238 Introductory Biostatistics Lab 2 Lab 2: Non-parametrics, ANOVA Q4: Sign and signed-rank tests Catnip is commonly believed to be a drug that inﬂuences the behaviour of cats. A volunteer5 at an animal shelter performed a study on the eﬀect of the plant on 15 cats, by measuring the number of negative interactions each cat made during two 15 minute windows: before and after administering the drug. The data can be downloaded from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/catnip.txt (a) Do the data look normal? (b) Use the sign test to test the hypothesis that catnip has no eﬀect on the behaviour of the cats. (c) Try also the Wilcoxon signed-rank test to test this hypothesis. Are your results consistent? (d) You may also wish to perform a paired t-test. What does the pattern of p-values tell you about the tests? Q5: Rank-sum test A liana is a kind of woody vine that grows in the tropics. Ecologists6 measured abundance of liana (in stems/ha) at randomly chosen plots in the Amazon. Plots were randomised such that 34 were near the edge of the forest (within 100m) and 34 were far from the edge. The question that arises is: is the density of lianas the same near the edge of the forest? The data can be downloaded from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/liana.txt (a) Do the data look normal? (b) Perform a Wilcoxon rank-sum test of the hypothesis that liana density near the forest edge is the same as the density far from the forest edge. (c) You may also wish to try a two-sample t-test of the same hypothesis. Given your answer to the ﬁrst part of this question, which test would you prefer? Q6: One-way ANOVA Attempt to answer questions 1 and 3 from lab 1 within an ANOVA framework. How do your answers diﬀer from what you found using paired and twosample t-tests? Q7: One-way ANOVA L´pez-Castillo and colleagues7 were interested in the o eﬀect of stressful types of medicine on the burnout levels of physicians. They selected 25 medics from infectious disease (working mostly with AIDS patients), hæmophilia (also working with many AIDS patients), onJovan (2000). Stats 27:25–7 Laurance et al. (2001). Ecology 82:32–9 L´pez-Castillo et al. (1999). Psychother. Pshcyosom. o 68:348–56 6 7 5 cology and internal medicine wards in Spain, and measured their “burnout” level using the Maslach Burnout Inventory questionnaire. A summary of the data can be downloaded from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/burnout.txt (a) Do the standard deviations look similar enough to do an ANOVA? (b) Perform an ANOVA. Is there any evidence of a diﬀerence in burnout levels among the four groups of physicians? Q8: Two-way ANOVA (A question for all the NSmen!) Levels of testosterone decrease during times of stress. This is particularly the case for soldiers. Morgan and colleagues8 measured testosterone in 12 soldiers during a military exercise in which the (American) soldiers were “captured” and “interrogated”. The data can be downloaded from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/soldiers.txt (a) Look at the data. Which appears most prominent: variability in adrenaline between soldiers or variability between times? (b) Fit a two-way ANOVA using soldiers as blocks. Does adrenaline change over the course of the exercise? Does adrenaline diﬀer between soldiers? (c) Perform the following inappropriate analysis: a one-way ANOVA without blocking on soldiers. How do your conclusions change about the changing adrenaline levels over the course of the experiment? Hints: • The sign, Wilcoxon signed-rank and Wilcoxon rank-sum tests can be found under Stat → Nonparametrics. • Minitab calls the Wilcoxon signed-rank test the “1 sample Wilcoxon”. • Minitab calls the Wilcoxon rank-sum test the “Mann–Whitney”. • ANOVA can be found under Stat → ANOVA. • If you wish to do comparisons between pairs of treatments, there is a Comparisons option in the ANOVA window. I think “Fisher’s” corresponds most closely to what we spoke about during class. • You may be interested in exploring the power calculations Minitab allows. These allow you to determine the minimum sample size needed to achieve a speciﬁed power for particular parameter values. Check it out! 8 Morgan et al. (2000). Biol Pschyiatry 47:1889–901 Created by Alex Cook, National University of Singapore Page 1 ST2238 Introductory Biostatistics Lab 3 Lab 3: Regression Q9: Linear regression Elephants are one of the most important mammals, in part because they are the mammals least related to man9 . They are therefore relatively well studied. Being able to predict the gestation period of pregnant elephant cows is important in preserving their numbers. A team of zoologists10 investigated the relationship between the cranial-rump length and gestational age. Although they ﬁtted a more complex model, we shall assume a linear relationship is appropriate. The data can be downloaded from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/elephant.txt (a) Plot the data. Does it appear there is a linear relationship between cranial-rump length and gestational age? (b) Perform a linear regression. What are your best estimates of the slope and intercept? (c) If an elephant cow presented to you with an embryo of cranial-rump length 140cm, what would be your best estimate of the age of her embryo? Q10: Multiple regression Q11: Multiple regression Chloroﬂuorocarbon (CFC) is an ozone depleting substance. In the 1970s and 1980s, levels of CFCs in the atmosphere rose dramatically, prompting international action to cut the levels of ozone depleting substances in the Earth’s atmosphere, leading to the signing of the Montreal Protocol in 1987. Manufacturers creating CFCs met the challenge of the protocol and developed new, less polluting compounds over the following years. Monthly measurements of CFCs (parts per trillion) taken at Mauna Loa, Hawai’i, USA, are available from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/cfc.txt (They have been derived from Combined Chloroﬂurocarbon-11 data from the NOAA/ESRL halocarbons program.) (a) Plot the data. What would be the eﬀect of ﬁtting a simple linear regression of CFC versus time? (b) Split the data into two parts (manually). Fit a linear regression to each part separately. Construct conﬁdence intervals for the slope in each part. Does the obvious trend in the data appear to be a statistically signiﬁcant change? (c) Create a dummy variable equal to 0 before 1993 and 1 after. Fit a multiple regression using time and the dummy variable to predict CFC levels. From the output of the model, is there a statistically signiﬁcant change in the gradient of the line of best ﬁt? Hints: Public health decision makers are often interested in spatial heterogeneities in disease patterns, as these patterns may indicate ways to improve care. A groundbreaking study was carried out on factors inﬂuencing the propensity to lunacy in Massachusetts in the mid19th century, led by Edward Jarvis (who was not entirely co-incidently president of the American Statistical Association)11 . Recorded are multivariate data from 14 counties, each of which has: • Number of lunatics • Distance to nearest mental health centre • County population (thousands, actually from 1950) • County population density per square mile • Percent of lunatics cared for at home The data can be downloaded from http://www.stat.nus.edu.sg/staﬀ/alexcook/teaching/ 2009 2010/st2238/handouts/lunatics.txt Using a multivariate linear regression, determine what the factors are that inﬂuence lunacy occurence. Based on your conclusions, what would you advise to a policy maker? 9 Dawkins (2004) The Ancestor’s Tale—I strongly recommend this book! 10 Hildebrandt et al. (2007). Proc R Soc Lond B 274:323–31 11 For a more recent discussion, see Hunter (1987), Geograph Rev 77:139–56 • Regression can be found under Stat → Regression → Regression. • For the lunatics question, you may wish to start by normalising the number of lunatics by dividing by the county size. You may also wish to consider transforming distance to the nearest centre, perhaps by taking the reciprocal. I would try a series of models, discarding least signiﬁcant terms, until all remaining terms in the model are signiﬁcant. • Multiple regression can be carried out using Stat → Regression → Regression Created by Alex Cook, National University of Singapore Page 1 ST2238 Introductory Biostatistics Lab 4 Lab 4: Discrete statistics I Q12: Binomial test Phillips and Smith12 wished to investigate whether it was possible for terminally ill patients to postpone death until after some signiﬁcant date. They studies a group of ethnic Chinese women living in California who died within 1w of the autumn moon viewing festival. If they were unable to postpone their death until after the festival, we would expect the same number of deaths before and after the festival date. Of 103 women in the study, 70 died after and 33 died before the festival. Does this provide evidence that death can be postponed? Q13: Fisher’s exact test Souttou and colleagues13 were interested in whether the growth factor pleiotrophin was raised in individuals with pancreatic cancer. They measured this in 69 individuals—41 with the cancer and 28 controls—and noted whether the level was more than two standard deviations above the mean of the controls. This happened for 2 controls and 20 cancer patients. Use Fisher’s exact test to assess whether pleiotrophin is higher in cancer patients. Q14: Contingency tables Phenotypic traits are often correlated in spatially distinct subpopulations. The Badeners are an ethnic group in south-west Germany, in a region that is close to the border of the northern and southern European areas. Ammon collected data on hair and eye colour among Badeners14 in the late 19th century, when the German population was less homogenised than now. These are presented below: Hair: Eye: Brown Grey/Green Blue Brown 438 1387 807 Black 288 746 189 Fair 115 946 1768 Red 16 53 47 colour and high palmitic acid concentration; and P2, with brown colour and low palmitic acid concentration) and recorded the oﬀsprings’ seed colour and palmitic acid levels. Under the hypothesis of “independent segregation of two alleles in one locus with additive gene eﬀect and two alleles in another locus with complete dominance” the ratio of 3:6:3:1:2:1 is expected for the following phenotypic straits, respectively: Seed colour Brown Brown Brown Variegated Variegated Variegated Palmitic acid level Low Intermediate High Low Intermediate High total: Observed 17 28 22 4 8 2 81 with observed numbers for the P2×P1 cross in the third column. Perform a chi-squared goodness-of-ﬁt test on the hypothesis that the ratio really is 3:6:3:1:2:1. Hints: • Binomial and Fisher’s exact test can be carried out using Stat → Basic Statistics → 1 Proportion or 2 Proportion, respectively • A contingency table can be analysed by entering the entries in a few columns and going to Stat → Tables → Chi-Square Test (Two Way Table in Worksheet) • The simplest way to do the goodness-of-ﬁt test might be to calculate the expected numbers manually, then use Calc → Calculator to derive the subsequent calculations. • Note: for the goodness-of-ﬁt test there is no need to estimate any parameters as they are given by the theory. Is there an association between hair and eye colour in this ethnic group? Does this give any biological insight? Q15: Goodness-of-ﬁt An oil from the seed of ﬂax (Linum usitatissimum L) has been developed which has almost no linolenic acid and high levels of palmitic acid. Saeidi and Rowland15 wished to determine the nature of inheritance of the mutation responsible for variegated seed coat colour. They crossed two lines (P1, with variegated 12 13 14 Phillips & Smith, 1990, J Am Med Assoc 263:1947–51 Souttou et al, 1998, J Natl Cancer Inst 90:1468–73 Ammon, 1899, Zur Anthropologie der Badener, Jena: FisSaeidi & Rowland, 1997, J Heredity 88:466–8 cher 15 Created by Alex Cook, National University of Singapore Page 1 ST2238 Introductory Biostatistics Lab 5 Lab 5: Discrete statistics II Q16: McNemar’s test The p7 antigen is thought to be expressed in cell lines derived from breast cancerous tissues16 . A sample of 28 women with breast cancer had tissue samples taken before and after being treated with chemotherapy. Each sample was tested for the presence of p7. Fourteen never had p7 present, four had p7 present both before and after, none had it present after but not before, and twelve had p7 before but not after. Perform McNemar’s test of the hypothesis that chemotherapy has no impact on p7 antigen presence. Q17: Mantel–Haenszel test A common problem in epidemiological studies is the diﬃculty in recruiting individuals from all segments of society. One way the potential for this aﬀecting results can be assessed is by collecting data from the population as a whole on the more easily obtained covariates and comparing the representativeness of the sample to these. A study on factors relating to cardiovascular disease on children in urban South Africa was carried out during the 1990s in Soweto17 . To assess whether the group that were able to be contacted after 5 years of the study were typical of all individuals that started the study, the proportion of children whose mother had medical aid (akin to health insurance) at both time points were compared. Of 416 who were traceable after 5 years, 46 had medical aid, while of 1174 who were not traceable, 195 had medical aid. Test whether there is a diﬀerence between these two proportions using a contingency table. As pointed out by Morrell18 , this might be explainable by race. Below are the results, limited to “blacks” and “whites”: Aid Y N Aid Y N “Black” Not traced Traced 91 36 957 368 “White” Not traced Traced 104 10 22 2 condition for as long as possible. A study was carried out19 on spermatozoa survival in a sperm bank. Fifty sperm samples were placed in sodium citrate and glycerol—each at two unspeciﬁed levels—for an unspeciﬁed period of time. The number that survived this gruelling ordeal were recorded below: Sodium citrate Glycerol Survivors Total 0 0 34 50 1 0 20 50 0 1 8 50 1 1 21 50 where the levels of sodium citrate & glycerol are arbitrary labels assigned to them in the absence of proper information. Use a logistic regression to see if one, both or none of these treatments aﬀects sperm survival. Hints: • It appears Minitab does not have the capability to do Poisson regression. A free statistics package that does (and can do almost anything statistical you will ever need) is R. If you have Poisson response y and predictors x1 and x2, the command to run a Poisson regression of y on (x1 , x2 ) is > summary(glm(y x1+x2,family=poisson)) The glm comes from the fact that Poisson regression is a member of the set of generalised linear models (as is logistic regression). R is well worth investing time learning if you intend to do much statistics in the future. Do your conclusions change and if so, what is the reason? Q18: Logistic regression Medical donors are often scarce relative to demand, and hence it is desirable to keep their donations in good Yang et al (2003) Clin Cancer Res 9:201–6; see also Yach et al (1991), Paed Perinatal Epidemiol 5:211–33; Levitt et al (1999), J Epidemiol Comm Health 53:264–8 18 Morrell (1999) J Stat Ed 7:n3 17 16 19 By some unnamed people, at an unknown time and place, reported in Myers et al (2002), Generalized linear models, Wiley, with a terse description. Created by Alex Cook, National University of Singapore Page 1

DOCUMENT INFO

Shared By:

Categories:

Tags:
how to, microsoft word, red cards, committee proposal, the possibilities, university curriculum committee, group a, black cards, lab exercises, lab reports, 2 black, liberal studies, course change, sandwich elisa, the samples

Stats:

views: | 43 |

posted: | 12/31/2009 |

language: | English |

pages: | 6 |

OTHER DOCS BY klutzfu43

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.