BE540 - Introduction to Biostatistics Computer Illustration Topic ...

Reviews
Shared by: gregorio11
Stats
views:
11
rating:
not rated
reviews:
0
posted:
11/21/2008
language:
pages:
0
BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info BE540 - Introduction to Biostatistics Computer Illustration Topic 1 – Summarizing Data Software: Epi Info 2002 A Visit to Yellowstone National Park, USA Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook for a First Course in Statistics and Data Analysis. New York, John Wiley, 1995. Setting: Upon completion of BE540, you decide to take a vacation to the United States. Of particular interest is seeing an eruption of the famous "Old Faithful" geyser at Yellowstone National Park. Unfortunately, your time is limited and you do not wish to miss seeing an eruption. This worked example illustrates descriptive analysis of a data set of 222 interval times between eruptions of the Old Faithful Geyser, measured during August 1978 and 1979. Data File: GEYSER1.xls – Data set in Microsoft Excel Worksheet format. Description of Data: There are three variables, in the following order: INDEX - An index of the date of the eruption. We will not be using this variable. DURATION - The duration of the eruption in minutes. INTERVAL - The length of the interval between the current eruption and the next eruption. Objective: Describe the pattern of eruptions and predict the interval of time to the next eruption. Page 1 of 8 BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info 1. Read (Import) Excel worksheet into analyze process. Open EpiInfo2002 Epiinfo.exe > Analyze Data > Read (Import) > Ready to read data. Select proper ‘Data Formats’ (Excel) > ‘Data Source’ (D:\ …\geyser1.xls) > ‘Worksheets’ (geyser) > OK. Pop out “FILESPEC” window, if the variable names (Index, Duration, Interval) are already in the worksheet, check the “First row contains field names”, it’ll name data columns automatically. OK. Now, the Geyser data has been inputted. Page 2 of 8 BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info 2. Obtain a Histogram of Interval Times. From the analysis commands list, click on ‘GRAPH’, then configure the features as below, OK, then following histogram plot of interval times has been generated. Revise the figure settings by right click on the graph, then choose applied options. Remarks: The interval times are in the range of 40 to 100 minutes, approximately. There appears to be two groupings of interval times. They are centered at 53 and 78 minutes, approximately, there is a gap in the middle. Page 3 of 8 BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info Save the histogram as a picture that you can print directly or that you can insert into a document such as this one. Locate the “Epi Graph” window, where the interval histogram plot was created. Go to “File” > Export… > Check ‘JPG’ > Check ‘File’ > Type in picture name > Click ‘Save’. 3. In this example, a Box and Whisker plot is not very informative. Let's see why. Click “GRAPH” command > Graph Type “Box-Whisker” > Variable “Interval” > Box-Whisker Type “Median25%-5%” > Type plot title > OK. We can see the following plot: Remarks:The histogram summaries suggested that there are two groups of interval times. This cannot be seen in a Box and Whisker plot. Box and Whisker plots are excellent for summarizing the distribution of ONE population. They are not informative when the sample being summarized actually represents MORE THAN ONE population. Page 4 of 8 BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info 4. We have information on duration of eruption also. One possibility is that the duration of the current eruption is a predictor of the interval time to the next eruption. To investigate this possibility, construct a scatter plot of interval time versus duration. Plot the predictor DURATION on the horizontal axis (X) and the outcome INTERVAL time to the next eruption on the vertical axis (Y). Click “GRAPH” command > Graph Type “Scatter XY” > Variables: X-axis variable (Duration) and Y-axis variable (Interval) in order > Type plot title > OK. We get following scatter plot: Remarks The scatter plot confirms a suspected positive association. Longer duration times appear to predict longer intervals to the next eruption. Interestingly, the scatter plot still suggests that there are two distinct subgroups, distinguished by durations of less than versus greater than three minutes. Page 5 of 8 BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info 5. Create a grouped measure of duration and construct separate box and whisker plots of interval times for the interval times that follow eruptions less than 3 minutes in duration and the interval times that follow eruptions greater than 3 minutes in duration. We need define a 0/1 indicator variable DURGRP. Click “DEFINE” command > In “Define” windows, input new variable name “Durgrp” > OK. Click “IF” command > type IF…THEN…ELSE conditions > OK. Note: You have just created what is called an indicator variable to indicate a duration time that is greater than 3 minutes. It is equal to 0 for all durations less than 3 minutes and is equal to 1 for all durations greater than 3 minutes. Indicator variables are also called dummy variables or design variables. Page 6 of 8 BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info GRAPH > Box-Whisker > Variable “Interval” > Titles > Media-25%-5% > Bar of each value “Durgrp” > OK. Note: If you choose the group variable in the bar under “Bar for Each Value of”, it turns out two Box-Whisker in one plot. Otherwise, if you select “Durgrp” under “One Graph for Each Value of”, two separate Bos-Whisker plots are generated. The Box-Whisker Plot looks like: Page 7 of 8 BE540 Topic 1. Summarizing Data Computer Illustration: Epi Info 6. Finally, let's look at some numerical summaries, classified by the two groups. Click “Summarize” in commands list > In “Aggregate” bar, select Count, Average, Max, Min, etc. > In “Variable” bar, select appropriate variable > In “Into Variable”, input new variable name, N, Mean, Maximum, Minimum, etc. > Click “Apply”, all commands will appear in the blank window below. > Select DURGRP under “Group By”. Input a table name “Intervals” to store all these statistics, then OK. Go to commands list, click “Read (Import)” > Check on “ALL” under “SHOW” > Choose “Intervals” > OK. Created the simple summary statistics for variable INTERVAL: DurGrp N 0 1 Mean Maximum Minimum Summary 42 53 3721 12043 Variance StdDev 68 54.7205882352941 78 154 78.2012987012987 95 43.6073309920983 6.60358470772491 47.5474492827434 6.8954658495814 With the same procedure, the summarized table of DURATION: DurGrp N 0 1 Mean Maximum Minimum Summary 1.7 3.1 140.5 653.4 Variance StdDev 68 2.06617647058824 3 154 4.24285714285715 5.2 0.112420983318703 0.335292384820626 0.187170868347302 0.432632486467789 Note: Interval time and durations of eruption are different between two groups. The variances between groups are not very different. Page 8 of 8

Related docs
premium docs
Other docs by gregorio11
Interstate Commerce Act _1887_ - 1[1]
Views: 63  |  Downloads: 0
Civil Rights Act _1964_ - 2[1]
Views: 53  |  Downloads: 0
Angel Venture Capital Gain Deferral
Views: 222  |  Downloads: 2
Kansas-Nebraska Act _1854_ - 2[1]
Views: 53  |  Downloads: 1
Angel investors of a feather band together
Views: 249  |  Downloads: 3
Form 13614 SP Entrevista y Hoja de Informacion
Views: 246  |  Downloads: 2
FORM 1098E STUDENT LOAN INTEREST STATEMENT 2007
Views: 101  |  Downloads: 0
Pendleton Act _1883_ - 2[2]
Views: 67  |  Downloads: 0
Platt Amendment _1903_[2]
Views: 70  |  Downloads: 0
ANGEL INVESTORS GATHERING IN CENTRAL VIRGINIA
Views: 313  |  Downloads: 1