VIEWS: 12 PAGES: 196 POSTED ON: 5/12/2011
Stats & EBM NH&MRC levels of evidence „99 EBM: Conscious, explicit and judicious use of current best evidence in making patient care decisions. Clinical experience is integrated with external evidence from systematic research. I systematic review/meta-analysis of all relevant RCT II ≥ 1 good RCT III-1 ≥ 1 pseudorandomised CT (e.g. alternate allocation) III-2 Comparative studies (incl. systematic reviews of such studies) with concurrent controls and allocation not randomised, cohort studies, case-control studies, or interrupted time series with a control group. III-3 Comparative studies with historical control, ≥ 2 single arm studies, or interrupted time series without a parallel control group. IV Case series* V Expert consensus† Statistics: study design* Case series Observational Prevalence = cross-sectional = survey Compares a group of individuals with the disease Case-control (cases) with a group of individuals without the disease (control) in terms of their status on a variable or a number of variables which are thought to be risk factors for the disease in question Prospectively follows two cohorts one with the risk Cohort factor one without Experimental Clinical trials Systematic Uses rigorous method and evaluates quality Reviews Meta-analysis Evaluates quality of reviewed articles and using statistical methods integrates data into a quantitative summarised form Study types Cost-benefit Benefits are converted into the Economic same numerical units as the costs analysis Cost- Costs of a certain health care “effect” is calculated e.g. $1000 Effectiveness per additional MI prevented through education Costs of a certain personal quality Cost-Utility (utility) gain e.g. cost per additional QUALY) Quantitative methods are used to Decision analysis analyse primary studies and develop probability trees based on which decisions can be made Quick/Slow? Cheap/Expensive? Prospective/Restrospective? Randomisation? Design Pros and Cons: Big 4 & more* Funding bias Allows meta-analysis Maybe unethical/impractical* RCT Blinding Volunteer bias Hypotheticodeductiveº Surrogate endpoints/mis-rand/mis-blind common Hard to get matched controls Volunteer bias Cheaper than RCT Cohort Subjects/controls can be matched Hidden confounders : no randomization Blinding is difficult Supports causality but does not prove it Good for rare conditions Recall reliance Case- Fewer subjects than cross Confounders aplenty control sectional studies needed Deciding who is /when they are a “case” Give OR but not AR Finds association, not causality X sectional Ethically safe Neyman bias† Subjects act as own controls Bad for tx with permanent effects Cross-over All receive some putative benefit Washout period needed & may be unknown Blinding Approach to literature searching Clarify your question Search & select articles Valid results? What results? Is there any reason why results may not be applicable to my patients So what? do the results help in caring for patients Assessing research articles Recruitment Randomization Sample* Inclusion/exclusion: Real life patients? Real life context? Sample big enough? BIAS avoided? Clear statement of exposure and comparison Treated the same apart from exposure in question Exposure Blind subjects Ethical design/committee approval Clearly defined outcome Measured in the same way in control & exposed groups Clinically relevant vs surrogate end points Blind assessors Outcome Follow up complete/ intention to treat analysis Follow up long enough? Clinically significant benefit? Outweighing harm? Mean/SD: how large an effect/how precise an effect Notes on research assessment • Clinically significant benefit/harm: use RRR and NNT as well as CI • Randomization ensures that both known and unknown sample variables are evenly distributed. If the study is non-random look at both groups. If they‟re too different to begin with the study is invalid Attrition: If there are patients lost to follow up recalculate the results assuming the ones lost to the control group did well and those lost to the treatment group did badly. If it‟s still a decent result then it‟s worthwhile.Effects on result interpretation: A. If subjects with more side effects drop out, the medication may appear to be better tolerated than it is. B. If subjects who are not responding drop out, efficacy will be inflated. C. If there is a high overall drop out rate, the validity and generalizability of the results might be reduced. <= – Motivation – Death – Incorrect entry in the first place – Side effects – Clinical reason eg. Pregnency – Can‟t find‟m! Can‟t use “intent to treat” when doing an “efficacy analysis” as you need to measure effect in those who actually took the intervention Assessing review articles* Focused question? Valid and explicit article identification, selection, evaluation and combination ? Article handling was reproduced independently by different authors? Were the results similar from study to study? Homogeneity The differences between different studies‟ means are due to chance alone Subgroup analysis are only any good if • Only a few subgroups were analysed • The effect difference bwn subgroups is large • Effect difference is very unlikely to occur by chance • Was part of the hypothesis prior to the study being undertaken • Is replicated in other studies Hill 1965‟s 5 tests of causation: Qualities of the association that make causation more likely are 1 Consistency Replicated in diff. Studies 2 Strength Size of RR is large Good measurability of the degree to which one 3 Specificity particular exposure confers a particular risk 4 Temporality Exposure precedes outcome 5 Coherence Biologically plausible Presentation of Results • We don‟t just give a list of numbers • Results need to be presented in an easily- understandable format • Which can be easily analysed • Two main ways: – Continuous – Discrete Continuous • Requires – central tendency • Eg mean – scatter • Eg standard deviation – shape • Eg skew Central Tendency • Mean („average‟ to the layperson) – Total of all values divided by the number of values – Most useful in a normally-distributed sample • Median – The middle value – Eg. if 99 values, the 50th largest – Very useful for skewed distributions, as lessens effects of outliers • Mode – The value with the most results Scatter • How different are the results to each other? • Standard deviation – Uses differences from mean of each value – Most useful for normally-distributed sample – √(value-mean)2/n-1 – 67%±1SD, 95% ±2SD, 99.7% ±3SD • Interquartile range – Difference between 25th and 75th centile – Very useful for skewed distributions, as lessens effects of outliers Standard Error (of the mean) • Population standard deviation / n • Smaller than SD! • Indicates how well the sample mean approximates the population mean • So if n very large, more precise and confident estimate • Used for t-test, confidence intervals SD vs SE • Beware: some studies present SE, some sd, especially important in graphs • So make sure of what you‟re reading • SD is visual – tells you what the results look like on a graph • SE is about precision – tells you what the true value is likely to be – (Hint: think of confidence interval) Shape • If results plotted on a graph, what does it look like? • Eg. Bell-shaped Skewed (Normal, Gaussian) • For bell-shaped: – mean=median=mode – parametric analytic statistics are possible • use mean, s.d., number • which are more powerful • For bell-shaped: – mean=median=mode – parametric analytic statistics are possible • use mean, s.d., number • which are more powerful • For skewed – can‟t use parametric statistics (Unless data transformed to make it bell shaped) Mode Mean MEDIAN • Bonferroni correction: – multiplies P result by total number of tests to correct for this – So if 10 tests, need actual result of P<0.005 to give P<0.05 • Probably too conservative • Assumes different results are independent – which they won‟t be • But do comment on multiple comparisons if few, mildly-significant results and a lot of tests Hypothesis Testing • We start by assuming null hypothesis – There is no difference between populations – Both samples from same population • (Opposite: populations really are different) • 2 populations – MDD + photo for 10 weeks – MDD no photo for 10 weeks • Any difference may be due to pure chance • ie null hypothesis is true • ie no real difference • We reject the null hypothesis if it is very unlikely a result this big is due to chance • ie p less than a pre-set value (eg 0.05) • Note: refers to populations • We are trying to work out what is happening in the population from our samples • If we have bigger samples • ...it is more likely that they correspond to the true population • …so p value will be smaller as n increases • So you need number of subjects • As well as – difference between means – degree of scatter (sd) Example 2 • Null hypothesis: there is no difference in outcome between populations of depressed patients given fluoxetine or placebo • Study with good methodology gives: – Difference in mean Hamilton scores of 15 after 10 weeks – Population standard deviation is 20 – Effect size is 0.75 – t-test statistic gives P of 0.005 • Therefore, chance that there is no true difference is 0.005 (0.5%) • We can assume that fluoxetine better than placebo Confidence Intervals • Let‟s say the difference between the mean BDI scores of the two groups is 5 • This is the real difference between our samples • We are interested in what happens to the whole population of depressed teenagers • And whether there‟s a difference between two sub-populations – Those given or not given the signed photo • We can estimate what the true value is in the whole population from our sample • We may calculate that there is: • 95% chance that true population difference is between +17 and -7 • So a reasonable chance that getting the Becks photo actually makes depression worse • Can‟t conclude it‟s an effective treatment • If 95% CI +6 to +4 • More than 95% chance that the Becks photo does improve depression Confidence Intervals • Give a range, eg. 95% confidence interval for the difference in BDI between photo and no photo groups is 4 to 6 • Mathematical definition: – 95% of confidence intervals of samples randomly sampled from the population will contain the true population value • Practical definition – There is a 95% chance that the C.I. contains the true value • Quote both definitions in exam! Uses of Confidence Intervals • In describing most differences: – Difference between means of groups – NNT • If upper limit very large, suggests treatment may not be that effective – Relative Risk/Odds Ratio • If CI spans 1, result not significant Advantages of Confidence Intervals • Mathematicians love them • Most journals demand them • Give a range of likely values – Much easier to interpret at a glance than p value – So show how big difference really is • Clearly show an underpowered study – Very large CI Mathematical Assumptions of CI • Random sampling from population • Independent observations • If simple CI of difference between means – needs normal distributions • as a parametric test – Equal population standard deviation between groups – (Can do fancy stuff with log for skewed data) • In exam, end definition by saying: • „..providing all assumptions for the underlying statistical tests have been met‟. ISQs • If standard deviations are larger, effect size will be larger • F (if wider spread of results, less real difference between central tendencies) • For two studies with identical effect sizes, the p value will be smaller for the study with much larger numbers • T (if larger samples, more confidence that difference is a real difference) • If p < 0.05, there is a real difference between populations under study • F (just means that there is a high probability that they are ISQs • If the confidence interval for the difference in HRSD scores between treatment groups includes 0, we can conclude that it is likely that there is a real difference between treatments that is not due to chance • F (if CI includes 0, shows that confidence interval of likely difference includes 0, ie no difference) • The standard error of the mean is used in calculating confidence intervals for differences in means of continuous measures • T (uses multiples of: difference in mean/SEM) Problems with Statistical Tests… • Type I errors • A result is found to be statistically significant but is not really statistically significant • Statistically significant result suggests result likely to be significant • P = 0.05 means chance of 5% there is no real difference between groups • Difference can still be due to chance • Don‟t give too much attention to the actual threshold • Is P of 0.051 that different to P of 0.049? Multiple Testing • If a lot of tests done, more likely to get „significant‟ result • 20 statistical tests done on samples which are not truly different will give one result where P<0.05! • So watch for multiple comparisons! • Bonferroni correction: – multiplies P result by total number of tests to correct for this – So if 10 tests, need actual result of P<0.005 to give P<0.05 • Probably too conservative • Assumes different results are independent – which they won‟t be • But do comment on multiple comparisons if few, mildly-significant results and a lot of tests Beware of „Data Dredging‟ • It is possible to do lots and lots of analyses with a data set – Eg Sub-groups • Eg Divorced women aged 30-40 with a co-morbid anxiety disorder – Eg Different measurement instruments – Eg Cross comparisons, with various covariates • If you do enough analyses, you will find one with P<0.05 • Certainly will not make it statistically-significant, unless p very small Solution • Make a small number of a priori hypotheses • Adjust for multiple comparisons within these • Anything significant post hoc could be interesting • ….but only for hypothesis generation • ….not as an explanatory finding unless p very small Clinical Significance • Statistical significance doesn‟t equal clinical significance • Need to use clinical judgement • If numbers very large, very small difference will give „statistically significant‟ result • Effect size is a very useful measure – 0.8 seen as threshold of „very effective‟ • Look at actual numbers – Difference in Hamilton scale of 1 is not very interesting • Again, confidence intervals give a better picture of clinical significance One-Tailed Cheating • In most studies, we must assume that a difference can be in either direction – eg new treatment can be better or worse than placebo • So for P<5%, thresholds are top and bottom 2.5% of a statistic – eg significant if t < -1.98 or > +1.98 • 5% chance t < -1.98 or > +1.98 if no real difference – t = difference in means / SEM • If result outside this range, difference is significant • This is called „two-tailed‟ • If „one-tailed‟ test used, a result in the top 5% (a larger range) would be significant rather than • So difference in means wouldn‟t have to be so large for result to be significant • And is sometimes used to make a „non-significant‟ result „significant‟ • So criticise it! • Sometimes reasonable: i.e. if certain that difference will lie in one direction you can then use a one-tailed test which allows more power Power • Type II error • A result is found to be not statistically significant when there is a real difference between populations • Not as serious as a Type I error • It is important to make sure a study has enough power to find a hypothesised difference • Work out required sample size beforehand Variables Needed • Hypothesised difference ( n)… or effect size being looked for • Hypothesised variation ( n) – (During a large study, may have to review sample size as data available) • Significance level ( n if lower p value) – Remember to adjust for multiple comparisons • Proposed statistical test ( n if parametric) • Power level ( n) – Often 80 % • Can use a nomogram (eg Altman) to work out required sample size • You can work backwards once all data in, to find out the power the study had to find a significant difference • This can be very useful in meta-analysis • Paper will say eg – To detect an improvement from 40% in the control group to 60% in the intervention group, 250 patients would be needed to achieve 80% power at 5% significance • If this is not stated, assume it‟s not been calculated • Study may be underpowered, making it hard to assume anything • An underpowered study is a waste of time • and it is unethical to put patients through the trial ISQs • A type I error means that a result is falsely found to be statistically significant • T • If multiple tests are performed without adjusting significance thresholds, then a type II error is more likely • F (no effect on type II, but increases risk of type I) (Adjusting significance thresholds may type II) • A very low p value indicates that a result is clinically significant • F (statistical significance clinical significance) • If a hypothesised difference between populations is small, less subjects are needed to make a study adequately powered • F (if small difference, a lot of subjects needed to give statistical significance, as standard deviations likely to overlap more) 5. Analyse data • You have results, now what do you do? • If methodology assessed as adequate, with little chance of bias, look to see if different groups have different outcomes • And if so, can this be explained by chance? – statistical significance • If statistically significant, is it clinically significant? • Statistical tests depend on distribution of data • Continuous vs non-continuous – Results on scale vs yes/no • Continuous • Parametric vs non-parametric – Normally-distributed or not – Must be a high enough number for parametric • If results skewed, can be transformed to make them parametric – eg. log (eg pH), square root • If parametric: – More power – More statistical tests possible Parametric Tests • (a) The t-test • Compares two normally-distributed samples Parametric Tests • (a) The t-test • Compares two normally-distributed samples Parametric Tests • (a) The t-test • Compares two normally-distributed samples • Tests null hypothesis: are they from the same population? • (Opposite: are they really different?) • Need: – Difference in means – Estimate of population standard deviation – Number of subjects • Gives eg t(108 )=2.02, P<0.05 • The 108 is degrees of freedom (total n -2) • Degrees of freedom: sample size minus number of estimated parameters • If t above a certain threshold (from a table) for that level of degrees of freedom, then statistically significant • Special assumptions • Samples mustn‟t be markedly skewed – There is a formula • Variances must be similar (use F-ratio) – An alternative version can be used if not • Papers rarely say if these assumptions met • However, when giving a definition, give these assumptions • Two types of t-test • For independent samples – Eg different treatment groups • For repeated samples – Eg for same subjects, before and after an intervention – More power • (b) Analysis of Variance (ANOVA) • A bit like t-test, but compares more than two samples • Gives result: F(3,96)=4.1, P<0.05 • Similar threshold principle for t-test • But two sets of degrees of freedom – Between groups • 3, so four groups F: is value of F or – Within groups “variance ratio” test • 96, so 100 subjects • Lets you know only whether all samples likely to be from same population • If overall P < threshold • Must then compare groups with each other, and combinations, using t-tests (post hoc analysis using Schiffe, Fisher‟s or Tuckey‟s tests), to find out where significant differences lie • Eg may find out only one experimental group different from control • Again, can use ANOVA on unrelated or related scores • Power increased if related - reduces error effects of individual differences • (c) Multivariate Analysis of Variance (MANOVA) • Sometimes called multivariate modelling • t-test gives you differences between groups at end of study • Generally valid, but…. • Just compares two groups at the end • Doesn‟t take into account baseline differences between groups • Less power than actually looking at change within subjects • Groups may differ at baseline HRSD Placebo Fluoxetine Baseline 16 weeks • MANOVA gives you: • Time effect – Comparison of all subjects at baseline against end-point • Group effect – Comparison of all observations (baseline and end- point) of one group against all observations from another group • Group * time interaction – Looks to see if there is a different time effect in the different groups • ie which treatment is more effective – Looks at change within each subject • increased power • MANOVA gives you an F value, with significance (p value) • MANOVA can give you effect size • MANOVA can also take account of confounders which may not be balanced across treatment groups • MANOVA can be used for more than 2 treatment groups • MANOVA can look at treatment effects over several time points HRSD Placebo Fluoxetine 0 4 8 12 16 Time (weeks) • F value will give you the overall difference in effect between the two curves • t-tests sometimes used post hoc to look to see at which time points there was a significant difference • MANOVA can demonstrate differences in treatment effects for different subgroups • eg group * time * gender interaction • Group * time: shows if there is a different time effect for the different groups • Group * time * gender: shows if this group effect on outcome is different for each gender • Needs large numbers – eg 100 per group gives about 70% power of detecting effect size of 0.5 • (d) Analysis of Covariance (ANCOVA) • “an extension of ANOVA which adjusts means for the influence of a correlated variable or a covariate. Used when research groups are known to differ on a background-correlated variable, in addition to differences attributed to the experimental treatment.”* • A regression method, not an ANOVA! • Used to allow for baseline differences • Used rather like MANOVA • Special assumptions • 1. Needs pre- and post-treatment scores to correlate • 2. When post-treatment scores plotted against post-treatment scores, the lines must be parallel Post-treatment Score Placebo d Treatment Pre-treatment score • For an equivalent pre-treatment score, the active treatment gives a lower post- treatment score • The difference is d • A confidence interval for d can be given • Or a hypothesis test used Non-Parametric Tests • Depend on ranking, as can‟t rely on distribution • Eg for final HDRS scores, • Placebo group may have – highest, 2nd, 4, 5, 6th highest scores • Intervention group may have – 3, 7, 8, 9, 10th highest scores • Could these samples be from the same population? • (a) Mann-Whitney U-test • For two independent groups of scores • Looks at ranks in the two groups, correcting for difference in size of sample • Using a table, see if a significant difference • (b) Kruskal-Wallis Test • Like Mann-Whitney, but for more than two groups • (c) Sign Test • For paired results • Compares amount of subjects where scores go up with amount whose scores go down • (d) Wilcoxon Matched Pairs Test (Wilcoxon Signed Rank Test) • For paired results, like sign test • Takes account of magnitude of change • (e) Friedman Test • Like Wilcoxon, but more than two groups ISQs • Log transformation is sometimes essential before parametric statistical tests are used • T (if results skewed) • Parametric tests use the difference in means and the standard deviation • T (also needs number of subjects) • Non-parametric tests use the difference in medians and the inter-quartile range • F (use ranks) ISQs • The Kruskal-Wallis Test is a non- parametric test that can be used to compare more than 2 unrelated groups • T Non-Continuous • Eg Recovered or not recovered • Eg Recovered from depression 6 months after start of treatment Still Recovered Depressed Placebo 60 30 Treatment 25 58 • Uses counts • Doesn‟t use percentages (although they may be used for convenience) • Uses Chi-squared 2 • This is a non-parametric test – Uses counts, not a continuous distribution • Looks at the difference between observed and expected values • If numbers in boxes small, need to correct data – Yates‟ continuity correction – Fisher‟s exact test if very small • The need for informed consent can limit the external validity of treatment studies • T (can only recruit consenting subjects; if a lot of population under study would not be able to consent, sample not representative of population) Stratified/Block Randomisation • You may be asked what this is • It is important to make groups as identical as possible • With very large numbers, randomisation should ensure this • But there can be an imbalance, especially if numbers small • And, if small sub-groups, there may be imbalanced numbers in a group • Can use stratification/block randomisation/minimisation to ensure balance Validity • Does the scale measure what it is supposed to? • Face validity – Scale appears to be correct • Criterion validity – Can be compared with a known quantity, eg height – Relation between rating scale and diagnosis • Content validity – Appropriate coverage of the subject matter – Eg scale for depression should measure emotions, cognitions, physical symptoms, behaviour • Convergent validity – Whether measures which are expected to be correlated are indeed associated – eg correlation of emotional, cognitive, physical symptoms on depression scale • Divergent validity – Whether a scale discriminates between groups which are expected to be different, eg depressed people and non-depressed people • Predictive validity – The agreement between a present measurement and one in the future – Eg Suicidality scale and suicide attempt in future • Concurrent validity – Agreement with another (valid) scale – eg new depression scale correlates with HRSD • Construct validity – A composite of other types of validity Classification systems validity Descriptive validity – how well the lassification system describes clinical syndromes re tightness of criteria, overlap of symptoms and comorbidity (describing what we see without reference to aetiology or the theoretical basis of the conditions). • If a scale is not valid, it is of no use and you can‟t interpret meanings • A scale must be used with proven validity • Validity is proven in previous studies • Validity must be proven for the population under measure – eg children • So if a new scale is used for a trial, with no validity data, it‟s not very good, really! Reliability • Does the test give the same scores when repeated? • Test-retest reliability – Does the same test give the same score on the same subject when repeated? • Inter-rater reliability – Would different raters give the same score with the test on the same patient? • As with validity, a test should have proven reliability • In addition, if more than one person rates patients in a study, • Inter-rater reliability must be assessed between raters and stated • The kappa () statistic usually used • Should be above 0.7 Some Scales • The scale used can influence results! • You may be asked what a scale measures • May be: – structured/semi-structured/unstructured interview – self-rated questionnaire • May measure level of symptoms or give diagnosis • See pp 52-54 of Oxford textbook (3rd edition, green) for a good list • Hamilton Rating Scale for Depression – Unstructured interview – Biological weighting • Beck Depression Inventory – Self-rated questionnaire – Psychological weighting • Therefore on medication vs CBT trial: – HRSD may favour antidepressant – BDI may favour CBT • Therefore a good trial should use multiple measures • An effective treatment should give significant results on all measures Global Measures • Clinical Global Impression Improvement (CGI) – Very much improved (1) – No change (4) – Very much worse (7) – Often used in trials • Health of the Nation Outcome Scale (HoNOS) – Semi-structured interview – Measures multiple domains, eg. function, depressed mood, delusions – Versions for children and people with LD – Government like it • Another warning • What does a decrease in 5 on eg HRSD mean? • Different for different patients • Different 40-35 to 6-1 • So diagnosis may be better Diagnosis • These scales just show level of symptoms, don‟t give a diagnosis • Need a set, (semi-) structured interview to give actual diagnosis • eg Schedule for Affective Disorders and Schizophrenia (SADS), SCID, PSE • Pre-set diagnostic critieria (eg DSM-IV) • „Clinical interview‟ not adequate! • A good treatment will cause more patients to go into remission • Define remission/recovery a priori • Diagnosis still a vague term!! – Depends on criteria used • Watch out: a lot of psychology literature uses people with BDI above a threshold and calls them „depressed‟ ISQs • A scale must have good reliability for it to have good validity • T (if not reliable, measurements won‟t be valid) • Validity not essential for reliability • It is essential to assess the criterion validity of new scales that measure depressive symptoms • F (Can be useful to compare against DSM diagnosis, but not essential. Scale more important ISQs • Convergent validity is often measured by correlating overall scores on a new outcome scale with overall scores on an established scale • F (That is concurrent validity) • The HRSD is a good scale to use to diagnose depression • F (Gives level of symptoms, not diagnosis) Blinding Participants • Knowing they are having new treatment may cause favourable expectations or apprehension • May affect compliance • If they know they‟re having control treatment, may be more likely to seek additional treatment • Less likely to leave trial early Hawthorne effect: patients getting better because they know they‟re in the treatment group Blinding Investigators • Health-care providers, enrollers, trial designers, supervisors…. • Their attitudes about intervention may be passed to patients • May affect likelihood of giving adjunct interventions • May affect adjustment of dose • May affect whether they withdraw patient from trial • May affect encouraging/discouraging patients to withdraw from trial ISQs • Knowing the outcome status in a cohort study may lead to information bias • F (would do if a case-control study) • Stratification can be used to control for confounding in a case-control study • T (or a cohort study) ISQs • In a case-control study of the association between diagnosis of alcohol dependence and diagnosis of depression, the relative risk is the statistic which would give the most useful information on strength of association • F (cannot calculate RR from case-control; OR approximates RR if v rare outcome) • The t-test is commonly used to look at strength of association in case-control studies • F (use correlation or CI of OR) Odd & even number used to allocate patients to tx/c arms: • What is this type of allocation called? (2 marks) • Quasi-randomisation • What disadvantages could this type of allocation have in this case? (4 marks) • It potentially introduces selection bias (1) • The person discussing study entry with the subject may know what treatment they‟ll get (1) • This may influence whether or how they discuss the study (1) • So subjects in each group may differ in ways other than treatment (1) • It means there is not allocation concealment (1) Data Parametric Data are normally distributed e.g. height Non-para Data are NOT normal e.g. HAM-D Binary* E.g. gender i.e. not continuous like above Tests All non-parametric tests can be used on parametric data but not other way around One variable ≥2 variables P T-test ANOVA NP Mann-Whitney U test Kristal-Wallis test B X2 x2 Fisher‟s exact prob. Test These are all for independent variable/outcome data Stats: Definitions Relative risk =RR=risk ratio=rate ratio: in cohort study: risk of getting illness if you have risk factor vs. if you don‟t e.g. 1 in 3 (MI if smoke) divided by 1 in 10 (MI non-smoker) Variables associated both with outcome and the risk Confounders factor being studied Multivariate analysis which attempts to calculate Logistic influence of variables on (usu. Dichotomous) regression outcomes (taking into account confounders) Only one you can calculate in case-control Odds ratio studies. Should be > 3 to be worthwhile. Sl. Over-estimate of the RR & less useful than it The range of values within which there is a 95% Confidence probability the true population mean lies. If interval includes 1 then is insignificant. Power probability that a test will produce a significant difference Stats: Definitions The probability that a diseased patient will have a Likelihood certain test result vs. a non-diseased control having that ratio of a +/- result. LR+ of 5 means the +ve result is 5 times more likely to happen in a diseased person. LR- of 0.2 means test [in a diseased the -ve result is 0.2 x more likely (i.e. 5 times less person] likely) to happen in a person with the disease. SpPin When Specificity high Positive result rules in diagnosis Results are derived by comparing groups. In the Parallel vs latter each person in treatment arm has a person matched grp matched for confounding variables in controls Allows investigation of the effect of more than Factorial one independent variable on an outcome. design Variables may be combined or separate in cases. Single v double blind Single: patient‟s blind. Double: investigator‟s blind too. Reliability measure: Level of agreement between two Kappa raters beyond that which would have occurred by score chance alone. K=1 means perfect agreement. ≥0.7 usually taken to be good enough Stats: Definitions The amount of spread or variation in the data. Usually Variance recorded as a range or a standard deviation. = SD2 Z score Deviation of a given value from the mean divided by the SD Multivariate analysis Multi- Multivariate analysis: Methods to deal with more than one related 'outcome/dependent variable' (like variate two outcome measures from the same individual) analysis simultaneously with adjustment for multiple confounding variables (covariates). When there is more than one dependent variable, it is inappropriate to do a series of univariate tests. Hotelling's T2 test is used when there are two groups (like cases and controls) with multiple dependent measures (may be more than two), and multivariate analysis of variance (MANOVA) is used for more than two groups. FA & PCA are types of multivariate analysis Stats definitions Factor Seeks to summarize multi-variate data into a few “factors” that bunch together dependent variables that correlate analysis highly e.g. insomnia & appetite change usually go together and can then be called the “melancholic” factor. PCA aims at reducing a large set of variables to a small Principal* set that still contains most of the information in the component large set. While PCA summarises or approximates the data using fewer dimensions (to visualise it, for analysis example), FA provides an explanatory model for the correlations among the data. Eigenvalue: measure the amount of the variation explained by each principal component (PC) (or factor) and will be largest for the first PC and smaller for the subsequent PCs. An eigenvalue greater than 1 indicates that PCs account for more variance than accounted by one of the original variables in standardized data (i.e. one of the variables that have been collapsed into the PCs or factors). This is commonly used as a cut-off point for which PCs are retained. Multivariate Analysis • Powerful tool for investigating the effects of variables of interest • Adjusts for the effects of confounding variables • Very useful when you can‟t randomise samples • So very important in observational studies • Parametric • So needs normal distributions Many Types • At a very simple level, t-test • Multiple linear regression – Continuous outcomes – eg HRSD scores • Multiple logistic regression – Binary outcomes – eg diagnosis of schizophrenia • Cox regression – Time-to-event outcomes Multiple Linear Regression • Shows relative effects of different covariates (exposures) on continuous outcome • You get a model • eg HRSD = 5.54 + 0.345*age + 4.23*female + 9.6*alcohol…. • This can be useful for prediction… • What we‟re interested in is Are the covariates significant predictors? Are Covariates Really Significant Predictors? • Some covariates correlate strongly (eg alcohol use positively related to maleness) • We want to know if a covariate predicts outcome significantly…. • If all other covariates held constant • Some covariates may be redundant as they are strongly correlated with each other • So a model can drop them What This Means • Can conclude that alcohol not a predictor per se, it is the maleness that is the predictor, and it also predicts alcohol use – so alcohol should be dropped from model • However, may conclude that both alcohol and sex are significant predictors independently – so put them both in model • We may find that alcohol has different predictive effects for each sex – alcohol * sex interaction Presentation of Results • As a table Coefficient Standar t P (b) d error b Sex 4.23 0.9 4.7 < 0.001 Alcohol 9.6 4.2 2.29 < 0.05 Age 0.345 0.9 0.38 ns Sex * 3.6 1.1 3.27 < 0.01 alcohol interaction Interpretation of Results • If P < threshold, that covariate is a significant independent predictor for the outcome • Remember rules on interpreting P values – P < 0.05 means less than 5% chance that this result purely due to chance • So for our table... Coefficient Standard t P (b) error b Sex 4.23 0.9 4.7 < 0.001 Alcohol 9.6 4.2 2.29 < 0.05 Age 0.345 0.9 0.38 ns Sex * 3.6 1.1 3.27 < 0.01 alcohol interaction • Sex and alcohol use are significantly associated with HDRS scores • ..taking into account other variables in the model • Age ns, should be dropped from model • The effects of alcohol are different for males and females – Or different effects of sex for different alcohol intake! Deciding on a Model • Many models can come from one data set! • Have to decide on which model to use • Hierarchical selection – uses theoretical rationale of what is expected • Forward stepwise selection – starts with no covariates – start with most significant predictor – keep on adding less significant predictors until they‟re no longer significant • Backward stepwise selection – Start with all possible covariates – Keep removing least significant variable • All subsets selection – Computer looks at all possible models – Find model which explains the highest proportion of variance – Highest R² value • variation of response explained by model / • variation of the response variable • In fact, this can falsely increase if have lots of variables • Use adjusted R² to take account of this Multiple Logistic Regression • If binary outcome • eg diagnosis of schizophrenia, death • Works out how a variable affects the chance of the outcome occurring • Gives adjusted odds ratios for covariates • OR can be hard to interpret • Approximates relative risk for rare events The Model • Like for linear regression • But uses log-odds • Means scale can be negative as well as positive • If not log-odds would be 0 to infinity – 1 corresponding to no effect • eg, if p is the predicted probability of MDD • loge (p/1-p) = 5.54 + 0.345*age + 4.23*female + 9.6*alcohol…. Interpretation • To get corrected odds ratio for a variable • ORcorrected = ecoefficient • eg if coefficient in model is 0.7, odds ratio approximately 2 • Otherwise, interpretation of significance of predictors as with linear regression • You‟ll get a table saying if predictors are significant • Deciding on which model - same principles Some Potential Problems • 1. Only gives results of significance of predictors controlling for other covariates measured and included in the model • Does not control for covariates not thought of • So not as good as randomisation • 2. Must be normal distribution • 3. Must be linear, not curved, relationship between continuous variables • 4. Multicollinearity • A problem if two covariates correlate very highly (eg 0.9) • The one with minutely better correlation with the outcome measure is selected first • The other variable will supposedly be a much less good predictor – or even non-significant – as it correlates so much better with the first variable than the outcome measure • So study design important: don‟t include variables with very high correlation ISQs • Multiple logistic regression is used to investigate the effects of variables on a continuous outcome measure • F (binary outcome) • Multiple linear regression can be used to prove that an exposure of interest causes an outcome of interest • F (only shows association) • In multiple linear regression, if two covariates are highly correlated, there may be a type II error • T (multicollinearity) • Multiple linear regression is an appropriate statistical technique to use if there is a curvilinear relationship between variables • F (must be linear relationship) • Multiple linear regression cannot be used if there is a heavily skewed distribution of a proposed variable • F (can transform the distribution to make it normal) Stats: Definitions An RRR of 25% means the risk of the event in Relative Risk the exposed/exposed group is reduced by 25%! Reduction There is less than 0.5% chance that the difference P<0.5% was found by chance (I.e. 95% CI excludes 0) Number Number of people who would have Needed to Treat to be exposed/treated to prevent one event. Risk v Risk factor predisposes to disease development but may or may not act as a prognostic prognostic factor once disease occurs factors Shows time to event and can be used to comment Survival* on whether the chance of the even occurring goes curve up or down with time Stats: Definitions Difference in outcome magnitude between Effect size* control and treatment group DIVIDED by SD ES of 1 => treated patient is better than 86% of untreated patients Odds Usually is proportion of people with event/outcome to those without .e.g. the odds of dying in a year are 1/99 (1 person will die and 99 will remain alive) The proportion of cases in a population Population attributable to a given risk factor (e.g. PAR of attributable genetic causation is low in scz i.e. many sporadic cases occur with no family history) risk Fisher‟s A non-parametric test of statistical significance of the difference between two Exact small independent samples when the scores Probability belong to one or the other of two mutually exclusive classes test Using the 95% CI If it includes 1(rates) or 0 (reductions) then it is not significant. The bigger the sample the narrower it is. 95% CI of RRR = RRR1.96xSE* 95% CI of NNT = 1/CI of ARR! Positive studies: If the minimum putative effect in the CI is not clinically significant then the numbers are too small. Negative studies: If the maximum putative effect still clinically significant then it doesn‟t prove lack of benefit. Using Number Needed to Treat E.g. although the relative risk might sound impressive if the absolute risk is quite low then treatment may still not be worthwhile as NNT’ll be very high. •NNT will be much higher in a low risk group meaning that although there might be a similar relative lowering of adverse outcome it may not be reasonable to treat – that‟s why it might not be smart to give statins to hypercholesterolemic, but otherwise low risk people: even though it might reduce their risk of MI by the same 25% as giving it to a high risk group, you might only prevent a few MI although you‟re putting huge numbers on statins •NNT can also be used to take side effects into account: – If 30% of patients get fat on olanzapine and NNT to prevent 1-year relapse following a drug-induced psychosis is say 30 you‟ll make 10 people fat to prevent one relapse but in a schizophreniform disorder group where NNT is 3 you‟ll only be getting 1 people fat to prevent an episode. You can come up with a hypothetical factor F that says how close your patient are to the study population e.g. 0.5 (i.e. half as likely to respond to the particular treatment because they‟re hard cases!). The adjusted NNT will then be NNT/F Sensitivity, Specificity & Predictive Value of Tests PVs BUT NOT Se/Sp are prevalence dependant Sensitivity/Specificity Disease No Disease Predictive + Test a b Value - Test c d Sensitivity a/(a+c) a/(a+b) PPV Specificity d/(b+d) d/(c+d) NPV Likelihood ratio +ve= (a/(a+c))/(b/(b+d)) = sensitivity/(1-specificity) Likelihood ratio -ve = c/(a+c)/(d/(b+d))= (1-sensitivity)/specificity Using likelihood ratios Probability = Odds/(1+odds) Odds = P/(1-P) Pre-test probability = prevalence = (a+c)/total Pre-test odds Post-test odds = LR+ x Pre-test odds Post test probability = 1(1+post test odds) = a/(a+b) = PPV for the population* Which is the percentage of people scoring positive on the test who will actually have the result (just replace with LR- in above to obtain Post-test prob for -ve result: percentage of people who score negative on the test who will have the disease) Risk & exposure data* Exposed Control Event EE CE No event EN CN EER EE/E CER CE/C ARR CER-EER ABI=EER-CER RR EER/CER RRR 1-RR% RBI= ABI/CER % ARR/CER% NNT 1/ARR Risk & exposure data Exposed Control Event EE CE No event EN CN Odds EE/EN of EE Odds CE/CN of CE OR OEE/OEN Stats: errors & power Type I Making up the difference False positive Null hypothesis in fact correct Type II Missing the difference False negative Null hypothesis is wrong Power = 1- usually ≥0.8 is good enough All negative studies must include a power analysis Data types Positions in a race Ranked data A, B, C, D where difference Ordinal doesn‟t matter. If 2 people come equal second they‟ll be counted as 2.5 in rank Temperature in Celsius Quantitative with arbitrary zero and (equal) Interval steps. 100 ≠ 2x50 in amount. Temperature in Kelvin Starts as zero with equal steps Ratio 100=2x50 in amount Blood type Mutually exclusive, unordered categories, Nominal must be exhaustive Stats: Bias* Any systematic error that results in an incorrect estimate of the association b/n exposure& outcome: introduced by the researcher and is usu. a product of study design S Selection P Performance A Attrition: drop-outs/Migration* R Recall D Detection I Insensitive measure C Compliance Worked Example Gold Standard + - Screening + 90 5 a b test - 30 60 c d What are sensitivity and specificity of the test? • Sensitivity: 90/(90+30) = 0.75 • Specificity: 60/(60+5) = 0.92 • What is the PPV of the test for this sample? • PPV = 90/(90+5) = 0.95 Example 2 Gold Standard + - Screening + 90 5 a b test - 30 60 c d What is likelihood ratio of a positive test? • 0.75/(1-0.92) = 0.75/0.08 = 9.375 • What is likelihood ratio of negative test? • (l-sensitivity)/specificity • 1-0.75/0.92 = 0.25/0.92 = 0.27 Example 3 What are the pre-test odds of the condition? • Pre-test odds = (90+30)/(5+60) = 1.85 • Prevalence = (90+30) / (90+30+5+60) = 0.65 – So pre-test odds = 0.65 / (1-0.65) = 1.85 • This data is from an in-patient sample Gold Standard + - Screening + 90 5 a b test - 30 60 c d Example 4 • In a community sample, prevalence is 10% • What is the PPV in the community of the test? • Was 0.95 in IP sample, where prevalence 65% Example 5 • In a community sample, prevalence is 10% • What is the PPV in the community of the test? • Post-test odds = pre-test odds * LR • Pre-test odds = ratio with disease to without • = 0.1 / 0.9 = 0.111 • Post-test odds = 0.111 * 9.375 = 1.042 • Post-test probability = post-TO / (postTO+1) • = 1.042 / 2.042 = 0.51 • PPV = 0.51 • If low prevalence, PPV of sensitive test is low Question 6 Gold Standard + - Screening + 90 10 a b test - 110 190 c d • What are sensitivity and specificity of the test? • Sensitivity: 90/(90+110) = 0.45 (2 marks) • Specificity: 190/(190+10) = 0.95 (2 marks) • What is the PPV of the test for this sample? • PPV = 90/(90+10) = 0.90 (2 marks) Question 6b Gold Standard + - Screening + 90 10 a b test - 110 190 c d What is likelihood ratio of a positive test? • 0.45/(1-0.95) = 0.45/0.05 = 9 (3 marks) • What is likelihood ratio of a negative test? • (l-sensitivity)/specificity • 1-0.45/0.95 = 0.55/0.95 = 0.58 (3 marks) Question 6c Gold Standard + - Screening + 90 10 a b test - 110 190 c d • What are the pre-test odds of the condition? • Pre-test odds = (90+110)/(10+190) = 1 (2 marks) • Prevalence = (90+110) / (90+110+10+190) = 0.5 – So pre-test odds = 0.5 / (1-0.5) = 1 • This data is from an in-patient sample Question 6d • In a MFE OP sample, prevalence is 25% • What is the PPV in MFE OPs of the test? Question 6e • In a OP sample, prevalence is 25% • What is the PPV in MFE OPs of the test? • Post-test odds = pre-test odds * LR (2) • Pre-test odds = ratio with disease to without • = 0.25 / 0.75 = 1/3 (2) • Post-test odds = 1/3 * 9 = 3 (1) • Post-test probability = post-TO / (postTO+1) (2) • = 3 / 4 = 0.75 (1) • PPV = 0.75 (2) Selection Bias: types* • Prevalence: cancer as outcome kills some of the cases (i.e. ≈ Neyman bias) • Admission rate: hospitalised patients may over- represent risk and outcome • Volunteer/non-responder • Membership: characteristics that lead to group membership influence outcome • Procedure selection: patient characteristics influence selection of treatment PAVMeP Procedure bias: types • Performance: cases and controls treated differently • Recall • Insensitive-measure: instrument not accurate enough to measure change • Compliance: one treatment may be more palatable than the other • Detection Stats: odds and ends Reliability measures: – Kappa statistic for NOMINAL – Correlation coefficient for NUMERICAL Variability source: – Data e.g. BP taken different hours – Examiner e.g. different take on schizophrenia – Instrument e.g. patient rated pain scale Beware – multiple significance tests – multiple analyses of same data “Fishing expedition” is where data is produced first then mined for significant bits which are then Stats: odds and ends Improving credibility of the control group: – Make them as similar in selection & treatment as poss. – ASK them afterwards if they thought they were in the tx or control group Multiple comparisons between groups: multivariate methods* Definition To compare ≥ 2 dependent outcomes or variables simultaneously with adjustment for multiple confounding variables (covariates). When there is more than one dependent variable, it is inappropriate to do a series of univariate tests. Types MANOVA Hotelling‟s T2-test Factor / cluster analysis Priciple component analysis Canonical correlation Increasing statistical power • Increase the sample size. • Decrease the variance – scales with more accuracy or sensitivity. – increasing the homogeneity within groups. – stricter criteria for inclusion in each group (e.g. higher requirement for caseness). – controlling for confounding variables i.e. by using covariant analysis. Correlation and regression • Bivariate (two variable) data should always be first examined by a scatterplot • A scatterplot can give an idea of whether two variables are related • Correlation is used to assess the strength of the relationship between two variables • Regression is used to describe the relationship between two variables Correlation coefficient: the r Pearsons r Correlation For linear correlation coefficients 1= perfect +ve cor -1=perfect -ve cor Spearemans rs For any correlation 0= no cor P value That r is truly different from 0 BUT doesn‟t indicate significant correlation Spurious Subgroups correlations Outliers Occur Change of direction (u or n curve) because of Regression • Simple Linear Regression describes the relationship between two variables in more detail than correlation • It fits a straight line through a set of data: Y Line of best fit X Model approach to regression • The line of best fit is represented by the following equation: Y a bX • The intercept a and the slope b for the population of interest are unknown, and therefore must be estimated from sample data • The equation is the regression model, and a and b are known as model parameters ANOVA approach Analysis of Variance (ANOVA) approach: • We partition the variability of a response variable into: (1) Variation explained by the linear model (2) Variation unexplained by the model (residual) • If the proportion of variation explained by the model is larger than the proportion unexplained, then the linear model is appropriate • F test: Variance Ratio Test: between/within group variance Extensions Extensions If the scatterplot reveals that a straight line is not appropriate: • Non-linear regression - Transform the response (e.g. logarithm, square root) • Polynomial regression - Fit a quadratic (or cubic etc.) line through the data • Multiple regression - Adjust for effects > 1 independent variable Summary • Usually either correlation or regression is a suitable approach for assessing association in (numerical) bivariate data • Correlation coefficients assess strength of (linear) association • Regression allows description and prediction of a response variable • Both methods have possible misuses Reviews and Meta-analyses Qualitative research • Examines subjective experience, behaviour, meaning, intersubjective interaction without using statistical methods or quantification • Interpretive methods: observe & understand • Critical methods: examine origins with a view to effecting practical transformation in society The Funnel Plot • A scatter graph • Treatment effect on horizontal axis • Measure of study size on vertical axis • Precision of study increases with sample size • …so middle of plot likely to have larger studies 1/standard error xx xx x xxx xx xx x xx xx x xxx xx x xx x x x -1 0 +1 Effect size 1/standard error xx xx x xxx xx xx x xx xx x xxx xx x xx x x x -1 0 +1 Effect size Should be seen if all studies used Would give effect size about 0.4 1/standard error xx xx x xxx xx xx xx xx x xxx xx x x x -1 0 +1 Effect size Often seen 1/standard error xx xx x xxx xx xx x xx xx x xxx xx x xx x x x -1 0 +1 Effect size Often seen Shows smaller studies with less effect not used Would give effect size about 0.6 • This publication bias may be obvious on eyeballing the funnel plot • Can be evaluated by statistical methods – Regression methods – Rank correlation approach – Sensitivity low if less than 20 trials • If funnel plot indicates severe publication bias, maybe you need to ignore the review! • Can use „trim and fill‟ – add proposed studies to make funnel plot symmetrical Combining Data • Need to compare comparative data • Difficult, as different studies use different measurements • So a good large RCT is better! • Need an outcome measurement • Eg. Effect size for primary outcome measure • Eg Odds ratio of getting outcome of interest with/without exposure of interest • Need an error estimate – confidence interval Draw a Nice Picture • A forest plot/blobbogram • x-axis: outcome measurement, eg OR – May have vertical line at no effect mark • OR 1/effect size 0 • Each study one after the other, on top of each other • Central measurement: square or diamond • Size of mark proportional to either: – number of subjects in study – precision of estimate (1/SE) Study1 Study 2 Study 3 Study4 Study 5 Study 6 Pooled effects (fixed) -1 0 1 Effect Size • Horizontal line each side shows CI • Can easily see how many studies have „significant‟ result, as CI won‟t cross the vertical line • There will be diamonds at bottom, showing pooled effects – fattest bit at the estimated value – horizontal limits are CI (will be narrow as many studies) Heterogeneity • Studies will give different values • Is this due to chance or different effects of studies? • Do hypothesis test • Null hypothesis: the studies varied so much due to chance • If P is low, suggests significant heterogeneity • This is given as chi-squared, with P Fixed vs Random Effects • These may both be given for pooled effects! • Fixed effects assumes no heterogeneity • Random effects takes into account heterogeneity – may give more weight to outlying studies – may give weight to small studies, which may be more open to systematic bias • Fixed effects has narrower confidence intervals • Will be no difference if no heterogeneity • If different, both should be published • If both significant, gives you more confidence in the result Conclusion • Look at actual pooled treatment effect • Is it statistically significant? • Is it clinically significant? • Very large numbers could make a tiny difference statistically significant • Look at effect size/pooled OR for clinical significance Quantitative vs qualitative research Empirical Interpretive Critical Paradigms Positivism Individual/ Myths/hidden Natural sciences collective truths that limit & origins meaning people Hermeneutics Marxist/feminist Types Observational Ethnography Participatory Experimental Phenomenology action Narrative Privileged Researcher Participant Stakeholder voice Ethics External Intrinsic Diagnosis Studies • What is diagnostic significance of a particular symptom • eg. FRS in schizophrenia • Do cross-sectional study • Like screening study • Diagnosis by „gold standard‟ • Symptom as „screening test‟ • Work out sensitivity, specificity, PPV, likelihood ratios like screening studies • Of course, PPV depends on sample! • So sample must be chosen appropriately – You may be asked to comment on it • Eg a random sample of patients suspected of having schizophrenia Prognosis Studies How to anticipate likely course of patient‟s illness • Inception cohort • Cohort of representative patients • Follow-up • See what long-term outcome is • eg what is chance of – death – independent living – future episode of depression….. • Can be compared against control group – eg 1st episode MDD vs no psychiatric illness • Sub-groups can be compared – eg melancholic vs non-melancholic MDD Is The Evidence Valid? 1. Appropriate Sample? • Representative of patients we‟re interested in – eg 1st diagnosis DSM-IV schizophrenia • Clearly-defined – so we know who it applies to • Early in course of disease – otherwise, some patients will have completed course of disease (die) before they‟re included in study 2. Sufficient Follow-Up? • Long enough – if too short, not enough time to develop the outcomes of interest • Complete enough – We don‟t know what happens to the subjects lost to F/U – Could be bias as may be more likely to drop out if very ill – <5% OK, <20% reasonable – Use sensitivity analysis • what if none or all drop-outs had outcome? 3. Appropriate Outcome Measurement? • Must be objective • Otherwise there could be bias • Especially important if groups compared • Pre-defined objective criteria should be applied • Blind assessors if possible to prevent…… • Information bias 4. Post-Hoc Comparisons? • Beware of data dredging • Lots of sub-group comparisons possible • Are sub-groups pre-defined? Are The Results Important? Data Presentation • Can be as rates with outcome at set-time point – eg 60% recurrence at 5 years • Can be as median time to outcome – eg 50% recur by 4 years • Can be presented as survival curve • Can present all 3 Precision of Estimates • Not exact answer, just an estimation based on sample • Use confidence intervals to tell us likely range of values in population • Use hypothesis tests as described earlier to compare groups • eg logrank test for survival curves • eg chi-squared for recurrence rate Clinical Significance • Is sample similar to our patients? • What would numbers actually mean to a patient? • Easier than for abstract statistics like effect size, odds ratio! „Epidemiology‟ • How common is an illness • Sampling crucial • Sometimes two stages: – screen – clear diagnosis • You are looking for how common an illness is in a population • So ask all population • ….or sample a truly random sample • ….who must be representative • Sometimes a whole birth cohort followed- up How Common? • Incidence • Number of new cases of illness per head of population in a set time period • eg 100 cases per 100,000 population per year • Prevalence • Number of people who have a condition at a certain time • Can be point prevalence – how many have diagnosis at the sampling point • Can be period prevalence – how many have the diagnosis at any time over a set time period Must check for…. • Are all potential subjects questioned? • If not, how many? • What efforts to find them? • How would those questioned differ from those not questioned? – If cases more likely to reply, overestimate prev – If cases less likely to reply, underestimate prev • Can see how respondents and non-respondents at stage 2 differ by looking at stage 1 scores Qualitative methodology Ax Congruence Did method “fit” question Was study done as said it‟ll be done Responsiveness Was design emergent: resp. to social context Appropriateness Sampling & data collection “fit” research Q Adequacy Details given re. method Sufficient sources in sample/weighted correctly Was it iterative? Transparency Provision of detail, Clear way of dealing with disagreements Privileging participants CARAT Qualitative data interpretation Ax Permeability Researcher‟s conceptual & personal journey: preconceptions to further understanding Reciprocity Participants involved in method & write up? Authenticity Verbatim accounts Range of voices incl. dissenters Recognizable to people having that experience Coherence Findings fit data? How much of data used? Did different researchers agree? Typicality What claims about generalizability are made PRACT Qualitative research: Delphi tech What for? Getting a consensus statement from a panel e.g. for clinical treatement guidelines Process Several rounds of a mailed survey encourage deeper analysis through examining disagreements and reaches closer to consensus with each round: •Ask a question (or several) of a panel •Collate and summarize the replies •Send summary out to start the next cycle. •Members may change original statement towards the emerging consensus or leave it the same & provide justifications. Qualitative studies: grounded theory Who dunnit? Glaser and Strauss How 1. read (and re-read) a textual database (such to do as a corpus of field notes) it? 2. "discover" or label variables (called categories, concepts and properties) and their interrelationships. What may •one's reading of the literature affect •one's use of techniques designed to "theoretical enhance sensitivity sensitivity*" ?