Biostats VHM Course Fall Atlantic Veterinary College PEI Henrik

Biostats VHM 801 Course Fall 2008, Atlantic Veterinary College, PEI Henrik Stryhn Index of 14-L (new slides only) Page 1 2 3 4 5–6 7 8 9 10 11 Title Practical information Topics for lecture Exam assignments Exam practical remarks Outline of statistical analysis Graphs in statistical analysis Choice of statistical model P -values Another look at t and t∗ General principles of tests and CIs 14L-0 Practical Information Major news: • last home assignment returned tomorrow, • you need to notify me about your choice on midterm (by Monday), • last lab tomorrow at 1pm (215N (teaching lab) at AVC; recommended!). Topics today: • Exam: Monday 8/12, 9am-12pm, Room CC 1051, ∗ exam practical remarks, ∗ exam assignments (types, calculations), – note: unless you tell me otherwise, only Minitab prints will be included in the exam! • Evaluation: • Review topics — next slide, ∗ I’ll leave at 10:10 to let you do the official evaluation, ∗ another evaluation form tomorrow (mine), • Sample problems - pick from these: • Your questions. . . 1 ∗ final 2001: 2–4 (suggested priority: 4,3,2), ∗ final 2002: 1–3 (suggested priority: 2,1,3), Classroom Centre, located near the large parking lot north of AVC. 14L-1 Topics for Lecture Review topics: pick from these! • outline of statistical analysis (partly new slides: 14L–5/6), • use of graphs (new slide: 14L–7), • model choice (new slide: 14L–8), • interpretation of P -values (new slide: 14L–9), • use of percentiles (new slide: 14L–10), • “usual way” for tests and CIs (new slide: 14L–11), • completely randomized design and block design (3L–9+10), • binomial setting (4L–13), • rules for means and variances (4L–17), • one- or two-sided (6L–6), • testing by confidence interval (6L–8), • sample size based on estimation accuracy/power (6L–12+13), • inference for non-normal data (7L–9), • nonparametric tests (8L–2) • sign test (8L–3), • inference for proportions — overview/single/two (8L–9+10+12), • multinomial distribution (9L–6), • two-way tables: models and estimation (9L–8), • after the ANOVA table (10L–12,13), • two-way ANOVA analysis (11L–11), • use of residuals for model check (11L–15). 14L-2 Exam Assignments 3 assignments — possible types: • choice of statistical model and analysis — ∗ carry out analysis when calculations manageable (see below), ∗ or base analysis on Minitab print + extra calculations, ∗ or outline analysis if calculations not manageable, • probability calculations manageable (see below), • multiple choice (one or several correct answers), • brief essay (10 lines) to explain statistical concept (or the like). Calculations by hand (calculator): • manageable without data entry into calculator, • no large summations, • examples of possible calculations: ∗ ∗ ∗ ∗ ∗ ∗ simple probabilities (e.g., 1−p, (1−p)n, simple binomial), standardization in normal distribution, normal approximation for binomial, test and CIs (given estimates and MSE), predictions in linear regression (but no intervals!), one- or two-sample proportions, • examples of too complex calculations: ∗ statistical analysis of normal models without calculation help (given statistics or computer output), ∗ probability plots, residual plots, ∗ X 2-tests and rank-based nonparametric tests. 14L-3 Exam Practical Remarks Start time and no. questions depends on your midterm choice: ∗ use midterm: start at 10am, 2 questions, ∗ drop midterm: start at 9am, 3 questions. All aids (books and notes and calculators) are allowed, — everything except a personal assistant or computer. The assignments have equal weight, unless specified otherwise — use your time reasonably! (no extra time!) Some hints and advices: (to use or not. . . ) • layout: essential requirements are ∗ readibility, ∗ clear distinction between what is in the solution and what is not, — do not use (too) much time to write first a draft and then a final version, • conclusions should be part of all analyses, • statistical model should be part of all data analysis, • explicit calculations may prevent loss of points due to typing errors (or the like), • errors: if you realize an error and do not have time to correct it: write what is wrong, what should have been done and how the error would affect the result. 14L-4 Outline of Statistical Analysis Data description: • descriptive statistics: plots, tables, simple statistics, • purpose: ∗ overview of the data, ∗ detect errors / “different” observations (outliers) ∗ focus attention on what’s relevant, Statistical model: we formulate statistical models containing theoretical distributions and unknown parameters, in order to • make clear the assumptions (and utilize them), • let parameters (fixed, unknown numbers describing a population) represent issues of interest, “All models are approximations, but some are useful” (Box). Estimation: our information about the parameters from the observed data is summarized in statistics or estimates: • quantities calculated from the data to give possible parameter values, • aim of statistical methodology: obtain estimates as close to true values as possible (though never equal to true values), 14L-5 • all estimates should be accompanied by a measure of uncertainty, such as standard error or confidence interval. Model check: • comparison of observed distribution and assumed theoretical distribution (using estimated parameters), • methods: graphical (plots) or numerical (tests), • if model unsatisfactory, start over with new model, Hypothesis testing: • formulate, using model parameters, null hypothesis H0 (model simplification) and alternative hypothesis Ha , • the test statistic and associated P -value summarize our confidence against null hypothesis, which we may reject (low P ) or not reject (high P ). Final Model: • simplest possible after model reduction, Conclusion / Presentation: • summary of test results, • if necessary, re-estimate model parameters (+ CI). • illustrations of the implications of the final model, e.g. prediction; also, presentation of estimates (with SE or CI). 14L-6 Graphs in Statistical Analysis Graphs have different purposes: • Descriptive: show the shape of the distribution in a dataset, ∗ ∗ ∗ ∗ dotplot and stemplot (raw data), histogram (grouped raw data), boxplot (schematic for descriptive statistics), scatterplot (raw data, 2 variables), • Model check: graphical assessment of one or more model assumptions, ∗ normal probability plots (check normal distribution within groups/samples), ∗ residual plots (check different assumptions in normal models), • Presentation: graphical display of estimates (possibly with measure of uncertainty) from (complex) analysis, ∗ group mean plots with error bars, based on model √ standard deviation ( MSE), (ANOVA models), ∗ fitted line plots (regression). 14L-7 note: small datasets best illustrated by raw data plots, and more data ⇒ more schematic/grouped plots, Choice of Statistical Model Some useful questions to ask about the data: • purpose of study? • what is the measurement unit/experimental unit/subject? • response/outcome or explanatory/predictor variable? • continuous or categorical (including binary) variable? • which variables/groupings/classifications should enter into the model? — some examples: ∗ a single sample (normal, binomial), ∗ two independent samples (normal, binomial, multinomial), ∗ several independent samples (one-way ANOVA, twoway table of counts), ∗ paired observations ⇒ single sample for differences, ∗ two-way classification (two-way table of counts), • continuous variable (explanatory or response) to be used for prediction of another variable? (regression) • two continuous response variables with no wish to predict one from the other? (correlation), • transformation? (to achieve normal distribution, homogeneity of variance, linear relation). 14L-8 P -values Significance as determined from P -value2: Significant ∗∗∗ 0.1% ∗∗ ∗ Non-significant NS 1% 5% P -value Interpretation and suggested wording (personal preferences): • P -values above 0.05 are usually denoted non-significant and interpreted as indicating that the result (observed value of the test statistic) could have occurred by coincidence if H0 is true, so that H0 cannot be rejected, • P -values just around 0.05 (no matter if just above or below) are borderline significant, and give indication that H0 may not be true, • formally, a P -value below 0.05 is evidence against H0, • P -values at 0.01 or below are clearly significant and give clear indication that H0 is false, so that H0 should probably be rejected, • P -values at 0.001 or below are strongly significant and show that the data are incompatible with H0, so that H0 without doubt should be rejected. 2 Warning ! — do not confuse P -values and values for p’s: • p’s are (probability) parameters in particular models, e.g. B(n, p). 14L-9 • P -values are values (certain probabilities) computed to interpret test results, Another Look at t and t∗ From a (hypothetical) confused individual: what are all those different t’s: t∗, t(df), t1−α (df), tobs, t, . . . For confidence intervals, e.g. with confidence level 95% (1 − α) and error level 5% (α), we need in our formula (1-sample), √ ¯ ± t∗ s/ n, t∗ = t1−α/2(df), µ: X • it determines the middle 95% of the t distribution, between 2.5% and 97.5% (α/2 and 1−α/2), • it is the 97.5% percentile in the t distribution, • it can be found in table D (under confidence level 95% or tail area probability 2.5%), • Minitab: enter the value 0.975 (or 0.025) and use Inverse Cumulative Probability, • Stata: invttail(df,0.025). For a test of H0: µ = µ0 (where µ0 is a known value), we compute the observed value of our t-statistic from the formula (1-sample), √ ¯ − µ0) / (s/ n), tobs = (X a number, t1−α/2(df), from a t distribution with df degrees of freedom: and the P -value is calculated from P(t(df) > |tobs|), where t(df) (or just t) refers to the t distribution with df degrees of freedom, • it is a tail area probability, and can be evaluated as < or > specific values using table D, based on table values below or above tobs, • Minitab: enter the value |tobs|, use Cumulative Probability, and subtract the result from 1; or directly using the value −|tobs|, • Stata: ttail(df,|tobs|). 14L-10 General Principles of Tests and CIs • Data X1, . . . , Xn, • Statistical model containing a parameter µ, typically a mean parameter. • Standard error SEµ, either ˆ ˆ • Estimate µ for µ, based on X1, . . . , Xn. ∗ estimated from the data, or ∗ known value (rarely realistic in practice), Var(ˆ ) = A σ µ 2 Note: in normal models we often have and SE(ˆ ) = µ √ A σ, where σ is standard deviation in model, and A is a constant determined by the form of µ, ˆ µ ˆ • Reference distribution of SE−µ) ∼ percentiles tp, (ˆ µ Note: in normal models with estimated SE(ˆ ) the refµ erence distribution is usually a t-distribution, • Test of H0: µ = µ0 against Ha: µ = µ0, µ − µ0 ˆ test statistic t= , SE(ˆ ) µ P -value P = 2 × P(t ≥ |tobs|), where t ∼ the reference distribution. 14L-11 • Confidence interval (1−α) for µ : µ ± t1−α/2 SE(ˆ ), ˆ µ

Related docs
Preparatory Programs for Veterinary School
Views: 6  |  Downloads: 0
Biostats for Medical Students
Views: 70  |  Downloads: 2
FOXO ChiP chip for biostats for Wendy
Views: 1  |  Downloads: 0
Ph.D. Nursing Course Schedule
Views: 0  |  Downloads: 0
1st Year - AY 2005-06 Course Report for The
Views: 1  |  Downloads: 0
Proposal for a New Graduate Degree Program
Views: 59  |  Downloads: 3
Spring 2007
Views: 0  |  Downloads: 0
Program Codes (2G 2Y)
Views: 2  |  Downloads: 0
Other docs by Piece Piece
INDEMNITY AGREEMENT
Views: 315  |  Downloads: 7
Board Resolution to Acquire a Company
Views: 259  |  Downloads: 5
Employee exit Interview
Views: 272  |  Downloads: 5
Letter of Intent for Joint Venture
Views: 2077  |  Downloads: 222
Sexual Harassment Policy
Views: 289  |  Downloads: 3
Duke ECE 163 Notes
Views: 601  |  Downloads: 16
CUSTOMER COMPLAINT RESPONSE LETTER
Views: 5105  |  Downloads: 65
Special Power of Attorney
Views: 829  |  Downloads: 31
Stock Certificate Preferred Stock
Views: 650  |  Downloads: 26
CELEBRITY HEADS
Views: 493  |  Downloads: 0
brown-all
Views: 352  |  Downloads: 1