Clinical Biostatistics
Dr Kripa Shanker Gupta
Introduction to
Overview of Presentation
Introductory Concepts (Review) Hypothesis Testing Linear Regression and Correlation Analysis of Variance (ANOVA) Nonparametric Statistics Survival Analysis
Introductory Concepts
Introductory Concepts
Types of Data
Presenting Data Descriptive Measures Probability and Distributions Estimation Techniques
Types of Data
Data are usually Discrete or Continuous
Discrete Variables take on a finite set of values that can be counted
Race, Gender, Year in School etc.
Continuous Variables take on an infinite set of values
Age, Height/Weight, Blood Pressure
Types of Data
A Special type of Discrete Variable is the Binary Variable which takes on exactly 2 possible values
Gender (M/F) Pregnant? (Y/N) Hypertensive? (Y/N)
Types of Data
Sometimes, discrete variables have a “natural ordering” to them
For example, names of consecutive days in a week (M, Tu, Wed, Thurs, Fri, Sat, Sun)
Other types of discrete variables do not have a natural order and are called Nominal Variables
Race (African American, Caucasian, Asian, Hispanic etc.)
Types of Data
If in an experiment you measure a single variable, it is called a Univariate experiment
If you measure 2 variables, it is called a Bivariate experiment And if you measure multiple variables, it is called a Multivariate experiment
Types of Data
A Random variable is one whose value is determined by chance or random event Typically, a variable X is random if it is the outcome of an experiment where results can occur by chance or are not completely predictable
Types of Data
Nonparametric Variables
Many times in clinical studies, we seek opinion data (I.e. patient satisfaction scores, relative value scales etc.) The data can be ranked but has no absolute scale that is comparable This type of data is called nonparametric data
Presenting Data
There are many ways to present data:
Frequency Tables Pie Charts Bar Graphs (Histograms) Line Graphs Scatter Plots (Scattergrams) Stem and Leaf Displays Box Plots
Descriptive Measures
Now that we have displayed our data, we want to be able to characterize it quantitatively
Measures of Central Tendency
Mean, Median, Mode
Range, Variance, Standard Deviation
Measures of Variability
Measures of Relative Standing
Z-Scores, Percentiles, Quartiles
Measures of Central Tendency
Mean
Arithmetic Average of a sample of data
Median
If you order the data from smallest to highest, the median is the middle value, assuming an odd number of data elements If you have an even number of elements, it is the average of the 2 middle numbers.
The most common value in a set of values
Mode
Mode
The value which is the “most popular “ in a continuous
distribution of scores
E.g. 2,4,4,4,5,5,5,5,5,6,6,6,6,6,6,7,7, No of 2’s-1, 4’s– 3, 5’s– 5 , 6’s– 6, 7’s- 2 Mode is 6 (most popular) GREATEST FREQUENCY Simplest but least useful
Useful when data has been divided into categories
Median
It’s the centre point of distribution Represents the value below which 50% of all scores are located. Divides the distribution into two equal parts ( 50th percentile)
E.g. 2, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, Median will be – 5 Better estimate than mode – not effected by certain extreme values. Therefore it can not tell exactly the variations.
Mean ( Average)
It is the weighted centre point Calculated by summing all observations and divided by total number of observations Adv : it takes how far the values spread and allows for extreme scores. It acts as balancing point for the distribution Most commonly used measure of central tendency
Measures of Variability
Once we have located the center of a set of data points, we want to know how “dispersed” they are
Measures of Variability
Range This is the difference between the highest and lowest value Variance Defined to be the average of the square of the deviations of the individual data points about their mean
Standard Deviation This is defined as the square root of the variance
Measures of Relative Standing
Percentiles and Quartiles also indicate relative standing but in terms of the categories of scores from lowest to highest
Given a set of n measurements x1, …, Xn the pth percentile is defined to be the value of x that exceeds p% of the measurements and is less than (100-p)% of the values. Ex: Scores of 20, 30, 50, 60, 67, 67, 70, 80, 90, 95
The score 50 is in the 30th percentile, meaning that 30% of the scores were lower than yours and 70% were higher than yours.
Contd…
Measures of Relative Standing
Quartiles similarly reflect in which quarter of the set of values a particular observation lies:
Ex: Scores of 20, 30, 50, 60, 67, 67, 70, 80, 90, 95 1st Quartiles = 50, 3rd Quartile = 80
Probability
Suppose you do an experiment with a finite number of possible outcomes (ex: coin toss)
The Probability of an event E (H/T) is the chance (%) that the event will turn out in a given way in the next repetition of the experiment
Probabilities values are always between 0 and 1
Contd…
Probability
The notation for probabilities is as follows:
Given our coin toss experiment, P(H) = Probability that a Head will be tossed in the next round P(T) = Probability that a Tail will be tossed in the next round
One can estimate probabilities by repeating the event many times and observing the outcomes
Probabilities: Some Simple Rules
Arithmetically, one can combine probabilities of simple and sequential events:
Given a complex event composed of N simple events, the probability of the complex event is equal to the sum of the probabilities of each of the simple events Ex: Coin toss 1 and Coin toss 2
First Coin
Heads Heads Tails Tails
Event
E1 E2 E3 E4
Second Coin P(Ei)
Heads Tails Heads Tails ¼ ¼ ¼ ¼
Let A = E2, E3. Then P(A) = P(E2)+P(E3) = ½
Probability Distributions
Given a random variable X (either discrete or continuous), the Probability Distribution gives a table or formula or graph of the probabilities of each potential value of X
For a Probability Distribution P(x) the following must hold:
0 <= P(x) <= 1 Sum (all P(x) over all x) = 1
Probability Distributions
There are many kinds of probability distributions:
Binomial Distribution
Applies to binary variable experiments where only 2 outcomes are possible
Poisson Distribution
Applies to variables that represent the number of occurrences of a specified event in a given unit of time or space
Hypergeometric Distribution
Applies to experiments where the numbers of elements in the population is small in comparison to the sample size and thus the success of a trial depends on the outcomes of preceding trials
Probability Distributions
Normal Distribution (N)
Applies to continuous random variables
Standard Normal Distribution (Z)
A Normal Distribution with:
Mean of 0 Standard Deviation of 1
Estimation Techniques
So now that we know that certain “experiments” can have results distributed in certain ways, how can we “predict” the result of this experiment? This process is called Statistical Inference, where we can estimate the quality of a larger population by analyzing a small sample
Estimation Techniques
Populations and Samples
A Population is the larger set of objects we wish to study
Ex: The number of democrats in the country
A Sample is a set of “representative” objects we choose in order to estimate the characteristics of the larger set of objects
Ex: Take 100 people from each state and determine whether they are democrats
Estimation Techniques
Parameters and Statistics
A Parameter is the “quality” of the population we are trying to estimate In order to estimate the parameter we measure the quality in a sample. This sample quality is called its statistic
Estimation Techniques
Many types of samples can be taken:
Completely Random Sample Stratified Random Sample
Divide the population into strata (groups) Take a sample from each group Ex: Party loyalties of teenagers, adults and elderly
Cluster Sample
Take a simple random sample of clusters from the available clusters in a population Ex: Urban vs. Rural sampling
Hypothesis Testing
Large Sample Estimation Techniques
Introduction to Estimating Techniques
Before we begin, lets review some common terms:
Point Estimate:
When we do an experiment and generate a result, the result at one point in time for one “run” of the experiment is called a point estimate (mean, etc.). Since each experiment has some error, there is a margin of error for every point estimate
Contd…
Introduction to Estimating Techniques
Interval Estimate:
Now if we repeat the experiment many times over we will get sense of how far off we are from running a “perfect” experiment. This sense of “confidence” in our experimental ability is called an interval estimate or a confidence interval.
Confidence Intervals
Typically, the confidence interval is defined as follows: CI = Mean +/- 1.96 x Variance / sqrt (N)
It tells us that if we repeat the experiment many times over, 95% of the time our values for the Mean will lie in the limits specified here
Confidence Intervals
X Z X X Z
n
_ x
_ X
1.645 x
1 .96 x
1.645 x
1 .96 x
90% Samples
95% Samples
2.58 x
99% Samples
2.58 x
Confidence Interval Estimates
Confidence Intervals
Mean
Proportion
Known
Finite Population
Significance Value (a)
Statisticians arbitrarily choose a value of 5% to represent events that can occur by chance alone
So if an event occurs more than 5% of the time, it is considered statistically significant The 5% value is called a significance value, or a
P-Values
A P-value is a useful way to represent the probability of a certain event and is seen extensively in the medical literature Definition:
The P-Value is simply the probability that an event occurs by chance alone Given our significance level of 5% for chance, we want P-values to be less than 5% or .05 to be considered statistically significant
Comparing Means
Many times we wish to compare the means of two subsets of a population
Ex: MCAT scores for Biology vs. Chemistry majors To do this we would :
Sample MCAT scores from random samples of biology &
chemistry majors across the country Compute the mean of all these samples Compare the means to determine if they are significantly different.
This kind of analysis is exactly what is done by Hypothesis Testing (we hypothesize there is no difference and then refute this hypothesis)
Hypothesis Testing
A statistical test of hypothesis consists of 4 parts
A NULL Hypothesis, termed Ho An Alternate Hypothesis, termed Ha A test statistic A rejection region
The NULL hypothesis is what we want to refute The Alternate hypothesis is what we want to support The test statistic is what we will use to compare the NULL and the Alternate Hypotheses The Rejection Region is the value of the test statistic for which Ho will be rejected
Hypothesis Testing
So what does this all mean IN LAYMANS TERMS? Basically we are asking the question that given a test statistic we specify, what is the probability that the hypothesis in question (Ha) is due to chance alone?
We convert the test statistic into a probability value by looking it up in a table that specifies the respective probabilities associates with that particular statistic value
Constructing a Hypothesis
Consider the following question:
We wish to show that the hourly wages of construction workers in California is larger than the national average of $14
The hypothesis will be written down as:
Hα: <> $14 Ho: = $14 Test statistic = Z-value = X – Uo / (Var/sqrt(N)) Rejection region = 0.05 (α value)
Testing a Hypothesis
The average weekly earnings for men in managerial and professional positions is $725. Do women in the same position have average weekly earnings that are less than those for men? A random sample of N=40 women in managerial positions showed X=$670 and Var = $102. Test the appropriate hypothesis using a = 0.01 Solution: Ho: U = 725 Ha: U < 725 Z = X – U / (Var/sqrt(N)) Z = 670 – 725 / (102 / sqrt(40)) = -3.41 Since -3.41 < 0.01 we conclude that Ho is false and the average weekly salary for women is significantly less than for men and the probability that we have made an incorrect decision is 0.01
Confidence in our Test Result
So what is our “confidence” in our result?
Well, we can have 2 types of errors:
Type I error = Rejecting Ho when Ho is true = a Type II error = Accepting Ho when Ho is false = b
To compute a confidence value, we calculate the Power of the Test which is the probability of correctly rejecting the NULL hypothesis
Power = (1-b)
Disease present
Test Positive True Positive(A)
Disease Absent
False positive (B) (Type II Error β ) A+B
Test negative
False Negative(C) True negative (D) (Type 1 Error α) A+C B+ D
C+D A+ B + C + D
Positive Predictive value = a/a+b X 100 Negative Predictive Value = d/ b+d X 100
Types of Tests
Given the kinds of data we have and the types of information we seek there are different types of tests available to us:
Students T-Test
Used to compare MEANS of two populations Works for small samples (N<30)
Chi-Square Test
Used to estimate a population’s VARIANCE
Used to compare the VARIANCES of 2 populations
F-Test
Types of Tests
We can do these tests in different ways:
We can have one-tailed and two-tailed tests
A One-tailed test occurs when our hypothesis mean is on one side (either less or greater) than the null hypothesis mean A Two-tailed test occurs when we can say that the hypothesis mean can be on either side of the null value
We can also do Paired Tests, where we do 2 tests in a specific sequential order
t-Test
Used to detect differences between means scores of two groups. Commonly used to detect difference between a pretest and a posttest score for one group. Use to compare performance of a control group and an experimental group.
T-tests: Small Sample Testing
Up to now we have assumed the sample size to be large (N>30) in order to achieve good power. But what happens when the sample size is small (N<30).
Well, in this case the shape of the normal distribution looks somewhat different – it is shorter and wider and is called the T-Distribution Every T-distribution has an associated Degree of Freedom (df) which is equal to N-1 A T-Table is consulted to get the appropriate values of the T-statistic when doing a T-test. You need the df and the significance level to look up the T-values.
Chi-Square Distribution
Remember that the T-test compares population Means. What if we want to estimate a population variance? In this case, we would use a Chi-Square distribution and our test statistic will be a chi-square value
X2 = (n-1)s2 / oo2
where n = sample size s = sample variance oo = Population Variance that we are trying to estimate
A variant of the Mantel-Haenszel Test
Chi-Square
Distribution
is
called
the
It is a test of association between 2 ordinal variables (frequency data)
Chi Square
Test of significance which is based on the differences between the observed and expected frequencies of an occurrence. Computed as:
∑
[Observed-Expected]2/Expected
frequency
Uses include: genetics experiments, probability of coin tosses, checking frequency of events.
F-Distribution
What if we want to compare the population variances of two different populations? In this case we use an F-Distribution and an F-statistic
F = s12/s22, where s1 and s2 are variances of Samples 1 and 2
Typically we will have 2 degrees of freedom (v1 and v2) with F-tests
The ANalysis Of VAriance
Also known as ANOVA
Analysis of Variance
A statistical tool for comparing the mean scores of two or more groups.
ANOVAs can be used everywhere t-tests can be used.
Mathematically analysis of variance is the ratio of the variation between the groups and the variation within the groups.
ANOVA is a powerful procedure which allows you to do 2 things:
Compare the variance between the means of 2 or more groups Compare the variance in data values within each group
ANOVA
Suppose you want to compare the mean reimbursement rates from 5 different health plans
You could do t-tests among all combinations of the 5 plans, or 10 t-tests Suppose all the means are equal. When this procedure is repeated 10 times, the probability of incorrectly concluding that at least one pair of means differ is quite high and you reach an erroneous decision Thus we want one test which could compare means for all 5 groups at the same time This is exactly what ANOVA provides
ANOVA
ANOVA procedures can be done with different study designs:
Completely Randomized Design
Random samples are independently selected from each of k populations. Assumes that the data is homogeneously distributed with a fixed variation
Assumes that subsets of the population have different variances Within each subset, however, the variability is the same Each subset is called a block. Random samples are then taken from each block
Randomized Block Design
Nonparametric Statistics
Analysis of Ranked Data
Nonparametric Statistics
What do we do when we have “oppinion data”? For example, suppose a judge is employed to evaluate and rank the
sales abilities of 4 salesmen, the edibility of 5 brands of Corn Flakes or the relative appeal of 5 brands or automobiles
Clearly it is impossible to give an exact measure of sales competence, the palatability of food or design appeal
But, it is possible to rank the salespeople, food or design choices
based on our own oppinions.
Many, Many types of studies in medicine use this kind of data
gathering (patient satisfaction is one example)
Nonparametric Statistics
There are many tests available for studying this kind of data:
The Sign Test The Mann-Whitney U Test
The Wilcoxon Signed-Rank Test for a Paired Experiment The Kruskal-Wallis H Test for Completely Randomized Designs The Friedman Fr Test for Randomized Block Designs
Spearman’s Rank Correlation Test
Test of Association
Spearman’s Rank Correlation Test
Tests whether there is an association between 2 populations Assume n pairs (xi, yi) of observations from 2 populations X, Y Rank each of the xi and yi in ascending order
Compute:
Rs = Sxy / sqrt (Sxx Syy)
Then given n and a, look up Ro in the Spearman Table Reject Ho (no association) if Rs => Ro or Rs <= -Ro
Few terminologies and their calculations
Abbreviation Variable CER
Equation
Value
subjects in control group subjects in experimental group events in control group
events in experimental group control event rate
= events / subjects in control group
250 150 100 15
0.4 or 40%
Abr EER ARR RRR NNT RR
Variable experimental event rate
Equation = events / subjects in experimental group
Value 0.1 or 10% 0.3 or 30% 0.75 3.33
absolute risk reduction = CER – EER (or increase) relative risk reduction (or increase) number needed to treat / harm odds ratio, relative risk = (CER - EER) / CER = 1 / ARR = CER / EER4
Randomisation
Randomisation is the process of assigning clinical
trial participants to treatment groups.
Randomisation gives each participant a known (usually equal) chance of being assigned to any of the groups. Successful randomisation requires that
group assignment cannot be predicted in advance.
Randomisation Advantages
If, at the end of a clinical trial, a difference in outcomes explanations for this difference would include: i) The intervention exhibits a real effect. ii) The outcome difference is solely due to chance.
occurs
between two treatment groups (say, intervention and control) possible
iii) There is a systematic difference (or bias) between the groups due to factors other than the intervention. Randomisation aims to obviate the third possibility.
Permits statistical methods to be applied to the data. Randomisation allows blinding. Current regulatory requirements require randomisation and blinding to be applied.
Randomisation disadvantages
1) If a variable is known to affect a disease outcome and is not controlled adequately than interpretation of results is difficult. 2) Practical problems.
Randomisation Procedures
Simple Randomisation Permuted Block Randomisation Stratified Randomisation Cluster Randomisation Dynamic (adaptive) random allocation
Bias
Bias is said to have occurred if the results observed reflect other factors in addition to (or even instead of) the effect of the treatment: Some potential sources of bias: Patient bias Care Provider bias Laboratory bias Analysis and Interpretation bias
CONFOUNDING
A problem resulting from the fact that one feature of study subjects has not been separated from a second feature, and
has thus been confounded with it, producing a spurious result. The spuriousness arises from the effect of the first feature being mistakenly attributed to the second feature.
Confounding can produce either a type 1 or a type 2 error,
but we usually focus on type 1 errors.
Blinding
All of these potential problems can be avoided if everyone involved in the study is blinded to the actual treatment the patient is receiving. Blinding (also called masking or concealment of treatment) is intended to avoid bias caused by subjective judgment in reporting, evaluation, data processing, and analysis due to knowledge of treatment.
Controls – Refers to group of patients who receive a treatment used for comparison with the trial medicine.
Hierarchy of Blinding
Open label: no blinding Single blind: patient or the investigator is blinded to treatment Double blind: patient and investigators (who often are also the health care providers and data collectors) blinded to treatment
Triple blind: statistician analyzing the data is also blind Full double blind: everyone who is coming in contact with the patient is blind including health care personal, nursing staff etc Full triple blind: everybody is blind who comes in contact with the patient or the investigator
Total clinical trial blind: everyone is blind who interacts directly with the patient, investigator or the data. Includes all the persons as in full triple blind as well as the radiologist who read radiographs, pathologists who read slides and so on and so forth
Open Label Studies
These may be useful for Dose ranging studies. Pharmacokinetic studies. Pilot studies. Phase 2 or 3 long term continuation trials Postmarketing studies. Compassionate plea trials. However, even these applications may be substantially biased by knowledge of the treatment given and may result in • toxicity over (or under) reported • efficacy over estimated. Even a small fraction of patients assigned at random to placebo will reduce these potential problems substantially.
Single Blind Studies
Only patient blind but not the investigator:
Justification: Double-blind is "impractical" because of need to adjust medication, medication affecting laboratory values, potential side effects, etc.
Rarely used.
Only investigator blind not the patient:
Justification: Unacceptable ethically to give an appropriate placebo treatment to a patient, and in such a case, the assessor (not the patient) should be the one blinded to the treatment. Double physician method has to be used.
Double Blind Studies
When both the subjects and the investigators are kept from knowing who is assigned to which treatment, the experiment is
called “double blind“.
Serve as a standard by which all studies are judged, since it
minimizes both potential patient biases and potential assessor
biases.
Should be used whenever possible, which is whenever it is ethically permissible to blind a patient.
Double Blinding : Techniques
Two physician method
Physician 1 – Unblinded physician speaks to and examines the patient, receive lab reports, evaluates the side effects and treatment effect. Physician 2 – Blinded physician receive reports from the physician 1 and evaluates the results.
Placebo
If only one drug has to be compared to the placebo. If 2 active drugs has to be compared.
Encapsulation
Disadvantages
Double dummy technique
Disadvantages
Placebo
Latin: Placebo, i shall be pleasing or acceptable.
Latin: Nocebo, i shall injure. Placebo – pharmacologically inert substance identical to the active drug to which it is compared.
Active control – medication whose efficacy has been
proven previously.
Active control or placebo controlled
Palcebo Objective
Difference Sought Analysis
Active Control
Real Pharmacological At least EQUIVALENCE , If Effect possible, improvement
Large One tailed Possibility Small Two tailed Test Confidence Interval and
Number of cases
Major Problem
Small
Ethical Consideration
Large
Choice of recognized drug and equitable condition of administration
Meta-analysis & Sys Review
A systematic review is an overview of primary studies
that used explicit and reproducible methods
A meta-analysis is a mathematical synthesis of the
results of two or more primary studies that addressed the
same hypothesis in the same way
Although meta-analysis can increase the precision of a result, it is important to ensure that the methods used for the review were valid and reliable
Advantages of systematic reviews
Explicit methods limit bias in identifying and rejecting studies Conclusions are more reliable and accurate because of methods used Large amounts of information can be assimilated quickly by healthcare providers, researchers, and policymakers
Delay between research discoveries and implementation of effective diagnostic and therapeutic strategies may be reduced Results of different studies can be formally compared to establish generalisability of findings and consistency (lack of heterogeneity) of results Reasons for heterogeneity (inconsistency in results across studies) can be identified and new hypotheses generated about particular subgroups
Quantitative systematic reviews (meta-analyses) increase the precision of the overall result
When Can You Do MetaAnalysis?
Meta-analysis is applicable to collections of research that
are empirical, rather than theoretical produce quantitative results, rather than qualitative findings examine the same constructs and relationships have findings that can be configured in a comparable statistical form (e.g., as effect sizes, correlation coefficients, odds-ratios, etc.) are “comparable” given the question at hand
Thank You