Abbreviated Summary of Biostatistics

Reviews
Shared by: xarrnet
Stats
views:
41
rating:
not rated
reviews:
0
posted:
10/31/2008
language:
pages:
0
Review of Biostatistics for Introduction to Epidemiology and Biostatistics Course 1. Risk Factor Definition 1: (risk factor) A risk factor, a characteristic that if present and active, clearly increases the probability of a particular disease in a group of persons who have the factor compared with an otherwise similar group of persons who do not. Concept 1: A risk factor is neither a necessary cause nor a sufficient cause for the disease. Even though in many studies smoking has been associated with increased risk (almost 20 times in some studies) of lung cancer in smokers compared with non-smokers, smoking is neither a necessary cause nor a sufficient cause of getting lung cancer. 2. Confounding Definition 2: (Confounding) If a third factor (or variable) can explain (at least partially) the relationship between the risk factor or variable of interest and the disease (or outcome), then confounding is present. The confounder is related to the risk factor of interest and to the outcome variable, but is not in the causal pathway itself. Exposure (Risk Factor) Disease (outcome) Confounder Example 1: The relationship between parity (number of children) and breast cancer risk was examined. The independent variable is parity and outcome is breast cancer. The investigators found that childbearing was protective against breast cancer. However, when they included a third factor (age at first birth) in the analysis, which was related to parity, the relationship between parity and breast cancer risk disappeared. Therefore, age at first birth was confounding the relationship between parity and breast cancer risk. Other examples are in the book. Concept 2: Any factor related to the disease (or outcome) could also be related to the exposure or risk factor of interest, and therefore could introduce confounding and influence the relationship between the risk factor of interest and the disease. Often age, sex, ethnicity and other sociodemographic factors are related to many diseases and exposures, and are usually considered as potential confounders. Concept 3: There are several strategies to deal with confounding: A) Restriction: restrict your study population to selecting a certain group, such as only males in order to avoid confounding by sex. B) Matching: matching of unexposed subjects to exposed subjects on the variable that might be considered a confounder. For example if you are concerned about sex as a confounder, you can match the exposed subject to the unexposed subject on sex (1 exposed male: 1 unexposed male). C) Data analysis: A confounding variable can be controlled for in the data analysis step. For example it can be included as an independent variable in the multiple regression models computed by a statistical program or you can stratify your analysis by the confounding variable (conduct two 1 separate analysis, one for males and females if sex is the confounding variable). Dr. Ziogas also presented more on this in class. Concept 4: Another way of assessing whether confounding is present is to examine the odds ratio or relative risk (discussed below) stratified by the factor thought to be the confounder and compare it to the overall OR or RR (not stratified). For example, let’s say we are examining the relationship between alcohol intake and myocardial infarction. We calculate the OR and find it to be 1.4. But we want to test whether smoking can be a confounder. We conduct the same association between alcohol intake and MI, but stratify by smoking status and find that in smokers the OR is 1 and in non-smokers the OR is also 1. Therefore, OR for the both of the stratified groups is different compared with the overall OR, therefore smoking is a confounder when assessing the relationship between alcohol and MI. Dr. Ziogas presented an example of this in class. 3. Interaction Definition 3: (Interaction/ Effect Modification) If the third factor modifies the relationship between the risk factor of interest and the disease (or outcome), then interaction is present. An interaction occurs between two risk factors when the effect of one risk factor upon the disease is different at different levels of the second risk factor. Sometimes the direction or strength of association between the risk factor of interest and disease (or outcome) will differ depending on the value or level of the third factor. Example 2: Figure 1: These two lines are parallel; they do not exhibit interaction BP Men Women 0 BMI Figure 2: These two lines are not parallel; they exhibit interaction BP Older Younger 0 BMI 2 If figure 1 represents the relationship between blood pressure (BP) and body mass index (BMI) in men and women, then the graph shows that the association between bmi and blood pressure is equally strong in both sexes – a one-unit increase in bmi in men and a one-unit increase in women both are associated with the same increase in blood pressure. Therefore no statistical interaction is present. In contrast, if figure 2 represents the relationship between BP and BMI in older people and younger people, then the graph indicates an interaction between bmi and age – a one unit increase in bmi in older people is associated with a larger increase in blood pressure than is a one-unit increase in younger people. Therefore, age (and depending on the different levels of age) modifies the relationship between BMI and BP. Concept 4: Another way of assessing whether an interaction is present is to examine the odds ratio or relative risk (discussed below) stratified by the factor thought to be the effect modifier. For example, in figure 2, two separate odds ratios (for the relationship between BP and BMI) can be calculated, one for the younger age group and the other for the older age group. If the odds ratios are different for the younger and older age group then we know that age is modifying the effect between BMI and BP. Dr. Ziogas presented an example of this in class. 4. Odds Ratios (Odds Ratios) The odds that an exposed person has the disease divided by the odds that a nonexposed person develops the disease. Concept 5: This is the estimate of risk (of disease or outcome) in the exposed compared to the unexposed in a case/control study. Example 1: A 2 x 2 table and calculation of odds ratios and relative risks Disease (Cases) (CHD cases) Exposed (Smokers) Unexposed (Non-Smokers) a (112) No Disease (controls) b (176) 288 smokers c (88) d (224) 312 non-smokers 200 cases Odds ratio = a/b = ad c/d bc 400 controls 600 total From the numbers above the OR = 112 x 224 = 1.62 176 x 88 Interpretation: The odds of having lung cancer are 62% higher in an exposed person (smoker) compared with that of an unexposed person (non-smoker). 3 5. Relative Risks Definition: (Relative Risk) Incidence of disease in an exposed person compared to the incidence of disease in an unexposed person. Concept 6: Because one can only estimate incidence from a cohort/prospective study, relative risk is the estimate of risk in a cohort study. Disease develops (CHD cases) Exposed (Smokers) Unexposed (Non-Smokers) a (112) No Disease develops b (176) 288 smokers c (88) d (224) 312 non-smokers 200 cases 400 controls 600 total Relative Risk (RR) = Incidence in the exposed = Incidence in the unexposed (a/a+b) (c/c+d) = 112/288 = 1.38 88/312 The risk of developing the disease is 1.38 times higher (or 38% higher) in the exposed compared with the unexposed. Interpretation: If RR = 1 Risk in exposed equal to risk in non-exposed, no association Risk in exposed is greater than risk in nonexposed (positive association; possibly causal) Risk in exposed is less than risk in nonexposed (negative association; possibly protective effect) If RR > 1 If RR < 1 6. Qualitative and Quantitative Questions Definition 6: (Qualitative Questions/Data) Qualitative data aims to find out what responses are possible to a particular question. Qualitative questions are often open-ended and allow detailed discussion about a particular topic, such as feelings a new mother may have regarding birth and early motherhood. 4 Definition 7: (Quantitative Questions) Quantitative research aims to obtain a good estimate of a number or value via questions or direct measurements. Examples include systolic blood pressure, cholesterol concentrations or averages or proportions of some variable. Statistics and statistical testing is useful for quantitative questions only. 7. Variation in data when addressing scientific questions Concept 7: Variation in data occurs inevitably when experiments are repeated Concept 8: Variation in data can be due to several factors: A) Unmeasured variables in the study (several factors related to etiology, co-morbid conditions, genetics when studying new cancer therapies). B) Inherent randomness in the data C) Presence or absence of disease and or different stages of disease in the population. D) Different conditions in measurement of data (e.g. time of day, seasonal, lack of standardization in how the data is collected). E) Different methods of measurement (measuring blood pressure via two different methods). F) Measurement errors (e.g. different lab equipment, different interviewers) 8. Types of Statistical Variables/Scale of Measurement Definition 8: (Categorical data) There are several different types of categorical data. A) Data with values that fall into unordered categories are called nominal. Gender and marital status are two examples of nominal data. B) If nominal data can take on only two values, they are called dichotomous or binary. So gender is not only nominal, it is also dichotomous. C) If data fall into categories, but the order of the categories matters, they are called ordinal. Injury status (minor, moderate, severe) is an example of ordinal data. Definition 9: (Discrete data) For discrete data, both the order and the magnitude of the values matter. The values aren’t just labels, but are actual measurable quantities. For example, the number of cases of TB in a one-year period in Boston would be discrete data. Note that discrete data can only take on certain values- you can’t have 12.5 TB cases. Definition 10: (Continuous data) If both the magnitude and the order of the values matter, and the data can take on an infinite number of values, then the data is continuous. For instance, weight is continuous since it can take on any number of values. 9. Measures of Central Tendency Definition 11: (Mode) The most commonly occurring value in the data set is the mode. 5 Definition 12: (Median) The median is the middle observation when the data have been arranged in order from the lowest to the highest value. It is also defined as the 50th percentile. The median is not as sensitive to outliers as the mean. Definition 13: (Mean) The mean is the average value and is calculated by summing all the observations and dividing by the total number. The mean is sensitive to outlying values. 10. Measures of Dispersion Definition 14: (Dispersion) After assessing the central tendency, the next step is to figure out how spread the numbers in your dataset are. This can be done in several ways as follows: range, variance, Standard Deviation (SD), Standard Error (SE). Definition 15: (Range) The range of the data reflects the highest value to the lowest value in your dataset. It is the largest observation minus the smallest observation. Definition 16: (Variance) The variance for a set of observed data is the sum on the squared deviations from the mean, divided by the number of observations minus 1 (formula on page 145 in book). Definition 17: (Standard Deviation) SD is the square root of the variance. The variance is a large number and many times, the value falls out of the range of observed values of the dataset. Therefore, the standard deviation is often used to describe the amount of spread in the frequency of the distribution. The standard deviation describes the variation in your population or study. Definition 18: (Standard Error) The SE is related to the standard deviation, but it differs in important ways. Basically, the standard error is the standard deviation of a population of a population of sample means, rather than the variability of individual observations, so that it provides an idea of how variable a single estimate of the mean from one set of research data is likely to be. 11. Hypothesis Testing Concept 9: In order to test a hypothesis, statistical tests that produce statistical significance must be conducted. Definition 19: (Null Hypothesis) In order to test a hypothesis, a null hypothesis is applied. The null is the hypothesis that there is no real (true) difference between means or proportions of the groups being compared or that there is no real association between two continuous variables. Concept 10: Statistical significance is “usually” determined by examining the p-value as well as the confidence interval of the point estimate. Definition 20: (P-value) The p-value obtained by a statistical test (such as a t-test or from a regression analysis) gives the probability that the observed difference could have been obtained by chance alone, given random variation and a single test of the hypothesis. Usually if the p value is < 6 0.05, then a real association between the two variables or difference between the mean is present (not just due to randomness). Definition 21: [Confidence Interval (CI)] A 95% confidence interval can be interpreted by saying that if we generate 100 such intervals (by conducting the experiment 100 times), then approximately 95 of them will contain the true parameter or estimate and 5 times it will not. It can be calculated as follows: 95% CI = mean ± 1.96 SE. For example, the mean of a study is 113.1 and the 95% CI is (109.1, 117.1), therefore we can say that if that study was conducted a 100 times, we are 95% confident that the mean will fall within that confidence interval of 109.1 – 117.1. This is the 95% CI around the mean. One can calculate a 95% CI for many things, including the mean, the RR, the OR, etc. Concept 11: Confidence intervals alone can be used as a test to see whether a mean or proportion differs significantly from a fixed value. The most common situation for this is testing to see whether a risk ratio or an odds ratio differs significantly from the ratio of 1.0 (which means there is no difference between groups). Thus if a risk ratio of 1.7 had a 95% CI of 0.92 and 2.70, it would not be significantly different from 1.0, because the 95% CI includes 1. Examples: RR = 1.8, 95% CI: (1.6, 2.0). RR = 1.8, 95% CI: (0.8, 2.9). OR = 0.7, 95% CI: (0.6, 0.8). OR = 0.7, 95% CI: (0.4, 1.2). This is statistically significant This is NOT statistically significant This is statistically significant This is NOT statistically significant Concept 12: Statistical analysis/tests are conducted to compare two parameters and determine whether those two parameters are statistically significantly different. The more commonly used statistical tests are t-test, z-tests for comparing two groups/means and or proportions, and in order to adjust for confounding or other related factors associated with the disease or outcome variable, multiple linear regression is used. Concept 13: A t-test measures the difference between two means/groups (so the data must be continuous, such as weight) and a corresponding p-value is provided to determine whether the two means are statistically significant. Concept 14: The z-test measures the difference in proportions (e.g. percentage of women with breast cancer in a group with high Vitamin C intake compared with a group with low Vitamin C intake) and a corresponding p-value is provided to determine whether the two proportions are statistically different. The variables used for a z-test are categorical variables. Usually a chi-square test (chapter 11) is conducted instead of a z-test to measure differences in proportions, which provides a corresponding p-value. Technically the computation of the two tests are identical. 7

Related docs
premium docs
Other docs by xarrnet
Humpday_number114 December8 2004
Views: 186  |  Downloads: 0
3-day Notice To Pay Rent Or Move Out
Views: 765  |  Downloads: 16
Sample Executive Summary EnergyGuide
Views: 357  |  Downloads: 7
RESIDENTIAL LEASE GUARANTY
Views: 257  |  Downloads: 4
Patent for Cotton Gin info
Views: 182  |  Downloads: 1
Satellite Dish Addendum
Views: 436  |  Downloads: 3
employee_disciplinary_action_form
Views: 712  |  Downloads: 32
Form 8812 Additional Child Tax Credit
Views: 866  |  Downloads: 11
Transcript of Chinese Exclusion Act
Views: 164  |  Downloads: 0
Assignment of limited partnership interests
Views: 493  |  Downloads: 18
Finance Lecture1
Views: 276  |  Downloads: 12
Canning business
Views: 331  |  Downloads: 3
60_Day_Notice_To_Change_Terms
Views: 192  |  Downloads: 0
Alternative designation of beneficiaries
Views: 275  |  Downloads: 1