Clinical Biostatistics
Dr Kripa Shanker Gupta
Introduction to
Overview of Presentation
Introductory Concepts (Review) Hypothesis Testing Linear Regression and Correlation Analysis of Variance (ANOVA) Nonparametric Statistics Survival Analysis
Introductory Concepts
Introductory Concepts
Types of Data
Presenting Data Descriptive Measures Probability and Distributions Estimation Techniques
Types of Data
Data are usually Discrete or Continuous
Discrete Variables take on a finite set of values that can be counted
Race, Gender, Year in School etc.
Continuous Variables take on an infinite set of values
Age, Height/Weight, Blood Pressure
Types of Data
A Special type of Discrete Variable is the Binary Variable which takes on exactly 2 possible values
Gender (M/F) Pregnant? (Y/N) Hypertensive? (Y/N)
Types of Data
Sometimes, discrete variables have a “natural ordering” to them
For example, names of consecutive days in a week (M, Tu, Wed, Thurs, Fri, Sat, Sun)
Other types of discrete variables do not have a natural order and are called Nominal Variables
Race (African American, Caucasian, Asian, Hispanic etc.)
Types of Data
If in an experiment you measure a single variable, it is called a Univariate experiment
If you measure 2 variables, it is called a Bivariate experiment And if you measure multiple variables, it is called a Multivariate experiment
Types of Data
A Random variable is one whose value is determined by chance or random event Typically, a variable X is random if it is the outcome of an experiment where results can occur by chance or are not completely predictable
Types of Data
Nonparametric Variables
Many times in clinical studies, we seek opinion data (I.e. patient satisfaction scores, relative value scales etc.) The data can be ranked but has no absolute scale that is comparable This type of data is called nonparametric data
Presenting Data
There are many ways to present data:
Frequency Tables Pie Charts Bar Graphs (Histograms) Line Graphs Scatter Plots (Scattergrams) Stem and Leaf Displays Box Plots
Descriptive Measures
Now that we have displayed our data, we want to be able to characterize it quantitatively
Measures of Central Tendency
Mean, Median, Mode
Range, Variance, Standard Deviation
Measures of Variability
Measures of Relative Standing
Z-Scores, Percentiles, Quartiles
Measures of Central Tendency
Mean
Arithmetic Average of a sample of data
Median
If you order the data from smallest to highest, the median is the middle value, assuming an odd number of data elements If you have an even number of elements, it is the average of the 2 middle numbers.
The most common value in a set of values
Mode
Mode
The value which is the “most popular “ in a continuous
distribution of scores
E.g. 2,4,4,4,5,5,5,5,5,6,6,6,6,6,6,7,7, No of 2’s-1, 4’s– 3, 5’s– 5 , 6’s– 6, 7’s- 2 Mode is 6 (most popular) GREATEST FREQUENCY Simplest but least useful
Useful when data has been divided into categories
Median
It’s the centre point of distribution Represents the value below which 50% of all scores are located. Divides the distribution into two equal parts ( 50th percentile)
E.g. 2, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, Median will be – 5 Better estimate than mode – not effected by certain extreme values. Therefore it can not tell exactly the variations.
Mean ( Average)
It is the weighted centre point Calculated by summing all observations and divided by total number of observations Adv : it takes how far the values spread and allows for extreme scores. It acts as balancing point for the distribution Most commonly used measure of central tendency
Measures of Variability
Once we have located the center of a set of data points, we want to know how “dispersed” they are
Measures of Variability
Range This is the difference between the highest and lowest value Variance Defined to be the average of the square of the deviations of the individual data points about their mean
Standard Deviation This is defined as the square root of the variance
Measures of Relative Standing
Percentiles and Quartiles also indicate relative standing but in terms of the categories of scores from lowest to highest
Given a set of n measurements x1, …, Xn the pth percentile is defined to be the value of x that exceeds p% of the measurements and is less than (100-p)% of the values. Ex: Scores of 20, 30, 50, 60, 67, 67, 70, 80, 90, 95
The score 50 is in the 30th percentile, meaning that 30% of the scores were lower than yours and 70% were higher than yours.
Contd…
Measures of Relative Standing
Quartiles similarly reflect in which quarter of the set of values a particular observation lies:
Ex: Scores of 20, 30, 50, 60, 67, 67, 70, 80, 90, 95 1st Quartiles = 50, 3rd Quartile = 80
Probability
Suppose you do an experiment with a finite number of possible outcomes (ex: coin toss)
The Probability of an event E (H/T) is the chance (%) that the event will turn out in a given way in the next repetition of the experiment
Probabilities values are always between 0 and 1
Contd…
Probability
The notation for probabilities is as follows:
Given our coin toss experiment, P(H) = Probability that a Head will be tossed in the next round P(T) = Probability that a Tail will be tossed in the next round
One can estimate probabilities by repeating the event many times and observing the outcomes
Probabilities: Some Simple Rules
Arithmetically, one can combine probabilities of simple and sequential events:
Given a complex event composed of N simple events, the probability of the complex event is equal to the sum of the probabilities of each of the simple events Ex: Coin toss 1 and Coin toss 2
First Coin
Heads Heads Tails Tails
Event
E1 E2 E3 E4
Second Coin P(Ei)
Heads Tails Heads Tails ¼ ¼ ¼ ¼
Let A = E2, E3. Then P(A) = P(E2)+P(E3) = ½
Probability Distributions
Given a random variable X (either discrete or continuous), the Probability Distribution gives a table or formula or graph of the probabilities of each potential value of X
For a Probability Distribution P(x) the following must hold:
0 $14 Ho: = $14 Test statistic = Z-value = X – Uo / (Var/sqrt(N)) Rejection region = 0.05 (α value)
Testing a Hypothesis
The average weekly earnings for men in managerial and professional positions is $725. Do women in the same position have average weekly earnings that are less than those for men? A random sample of N=40 women in managerial positions showed X=$670 and Var = $102. Test the appropriate hypothesis using a = 0.01 Solution: Ho: U = 725 Ha: U 30) in order to achieve good power. But what happens when the sample size is small (N Ro or Rs <= -Ro
Few terminologies and their calculations
Abbreviation Variable CER
Equation
Value
subjects in control group subjects in experimental group events in control group
events in experimental group control event rate
= events / subjects in control group
250 150 100 15
0.4 or 40%
Abr EER ARR RRR NNT RR
Variable experimental event rate
Equation = events / subjects in experimental group
Value 0.1 or 10% 0.3 or 30% 0.75 3.33
absolute risk reduction = CER – EER (or increase) relative risk reduction (or increase) number needed to treat / harm odds ratio, relative risk = (CER - EER) / CER = 1 / ARR = CER / EER4
Randomisation
Randomisation is the process of assigning clinical
trial participants to treatment groups.
Randomisation gives each participant a known (usually equal) chance of being assigned to any of the groups. Successful randomisation requires that
group assignment cannot be predicted in advance.
Randomisation Advantages
If, at the end of a clinical trial, a difference in outcomes explanations for this difference would include: i) The intervention exhibits a real effect. ii) The outcome difference is solely due to chance.
occurs
between two treatment groups (say, intervention and control) possible
iii) There is a systematic difference (or bias) between the groups due to factors other than the intervention. Randomisation aims to obviate the third possibility.
Permits statistical methods to be applied to the data. Randomisation allows blinding. Current regulatory requirements require randomisation and blinding to be applied.
Randomisation disadvantages
1) If a variable is known to affect a disease outcome and is not controlled adequately than interpretation of results is difficult. 2) Practical problems.
Randomisation Procedures
Simple Randomisation Permuted Block Randomisation Stratified Randomisation Cluster Randomisation Dynamic (adaptive) random allocation
Bias
Bias is said to have occurred if the results observed reflect other factors in addition to (or even instead of) the effect of the treatment: Some potential sources of bias: Patient bias Care Provider bias Laboratory bias Analysis and Interpretation bias
CONFOUNDING
A problem resulting from the fact that one feature of study subjects has not been separated from a second feature, and
has thus been confounded with it, producing a spurious result. The spuriousness arises from the effect of the first feature being mistakenly attributed to the second feature.
Confounding can produce either a type 1 or a type 2 error,
but we usually focus on type 1 errors.
Blinding
All of these potential problems can be avoided if everyone involved in the study is blinded to the actual treatment the patient is receiving. Blinding (also called masking or concealment of treatment) is intended to avoid bias caused by subjective judgment in reporting, evaluation, data processing, and analysis due to knowledge of treatment.
Controls – Refers to group of patients who receive a treatment used for comparison with the trial medicine.
Hierarchy of Blinding
Open label: no blinding Single blind: patient or the investigator is blinded to treatment Double blind: patient and investigators (who often are also the health care providers and data collectors) blinded to treatment
Triple blind: statistician analyzing the data is also blind Full double blind: everyone who is coming in contact with the patient is blind including health care personal, nursing staff etc Full triple blind: everybody is blind who comes in contact with the patient or the investigator
Total clinical trial blind: everyone is blind who interacts directly with the patient, investigator or the data. Includes all the persons as in full triple blind as well as the radiologist who read radiographs, pathologists who read slides and so on and so forth
Open Label Studies
These may be useful for Dose ranging studies. Pharmacokinetic studies. Pilot studies. Phase 2 or 3 long term continuation trials Postmarketing studies. Compassionate plea trials. However, even these applications may be substantially biased by knowledge of the treatment given and may result in • toxicity over (or under) reported • efficacy over estimated. Even a small fraction of patients assigned at random to placebo will reduce these potential problems substantially.
Single Blind Studies
Only patient blind but not the investigator:
Justification: Double-blind is "impractical" because of need to adjust medication, medication affecting laboratory values, potential side effects, etc.
Rarely used.
Only investigator blind not the patient:
Justification: Unacceptable ethically to give an appropriate placebo treatment to a patient, and in such a case, the assessor (not the patient) should be the one blinded to the treatment. Double physician method has to be used.
Double Blind Studies
When both the subjects and the investigators are kept from knowing who is assigned to which treatment, the experiment is
called “double blind“.
Serve as a standard by which all studies are judged, since it
minimizes both potential patient biases and potential assessor
biases.
Should be used whenever possible, which is whenever it is ethically permissible to blind a patient.
Double Blinding : Techniques
Two physician method
Physician 1 – Unblinded physician speaks to and examines the patient, receive lab reports, evaluates the side effects and treatment effect. Physician 2 – Blinded physician receive reports from the physician 1 and evaluates the results.
Placebo
If only one drug has to be compared to the placebo. If 2 active drugs has to be compared.
Encapsulation
Disadvantages
Double dummy technique
Disadvantages
Placebo
Latin: Placebo, i shall be pleasing or acceptable.
Latin: Nocebo, i shall injure. Placebo – pharmacologically inert substance identical to the active drug to which it is compared.
Active control – medication whose efficacy has been
proven previously.
Active control or placebo controlled
Palcebo Objective
Difference Sought Analysis
Active Control
Real Pharmacological At least EQUIVALENCE , If Effect possible, improvement
Large One tailed Possibility Small Two tailed Test Confidence Interval and
Number of cases
Major Problem
Small
Ethical Consideration
Large
Choice of recognized drug and equitable condition of administration
Meta-analysis & Sys Review
A systematic review is an overview of primary studies
that used explicit and reproducible methods
A meta-analysis is a mathematical synthesis of the
results of two or more primary studies that addressed the
same hypothesis in the same way
Although meta-analysis can increase the precision of a result, it is important to ensure that the methods used for the review were valid and reliable
Advantages of systematic reviews
Explicit methods limit bias in identifying and rejecting studies Conclusions are more reliable and accurate because of methods used Large amounts of information can be assimilated quickly by healthcare providers, researchers, and policymakers
Delay between research discoveries and implementation of effective diagnostic and therapeutic strategies may be reduced Results of different studies can be formally compared to establish generalisability of findings and consistency (lack of heterogeneity) of results Reasons for heterogeneity (inconsistency in results across studies) can be identified and new hypotheses generated about particular subgroups
Quantitative systematic reviews (meta-analyses) increase the precision of the overall result
When Can You Do MetaAnalysis?
Meta-analysis is applicable to collections of research that
are empirical, rather than theoretical produce quantitative results, rather than qualitative findings examine the same constructs and relationships have findings that can be configured in a comparable statistical form (e.g., as effect sizes, correlation coefficients, odds-ratios, etc.) are “comparable” given the question at hand
Thank You