"Ten Deadly Statistical Traps in Pharmaceutical Quality Control"
Ten Deadly Statistical Traps in Pharmaceutical Quality Control Lynn Torbeck Pharmaceutical Technology 29 March 2007 1 Your Morning Mantra “In theory there is no difference between theory and practice, but in practice there is.” Yogi Berria 2 The Ten Deadly Sins 1. Graphs 2. Normal Distribution 3. Statistical Significance 4. Xbar 3S 5. %RSD 3 The Ten Deadly Sins 6. Control Charts 7. Setting Specifications 8. Cause and Effect 9. Variability 10. Sampling Plans 4 Graph? What &%$# Graph? Q#1 “Have you graphed the data?” I have solved many statistical problems by simply graphing the data. Always, always, always plot your data. No ink on the page that isn’t needed. Cause and effect on the same page. Make the answer appear obvious. Read Edward Tufte’s books 5 Anscombe’s Astounding Graphs Average 9.0 7.5 7.5 7.5 9.0 7.5 Std Dev 3.32 2.03 2.03 2.03 3.32 2.03 X Axis Y Axis 1 Y Axis 2 Y Axis 3 X Axis 2 Y Axis 4 10.0 8.04 9.14 7.46 8 6.58 8.0 6.95 8.14 6.77 8 5.76 13.0 7.58 8.74 12.74 8 7.71 9.0 8.81 8.77 7.11 8 8.84 11.0 8.33 9.26 7.81 8 8.47 14.0 9.96 8.10 8.84 8 7.04 6.0 7.24 6.13 6.08 8 5.25 4.0 4.26 3.10 5.39 19 12.5 12.0 10.84 9.13 8.15 8 5.56 7.0 4.82 7.26 6.42 8 7.91 5.0 5.68 4.74 5.73 8 6.89 6 Anscombe’s Astounding Graphs N=11 Average of X’s = 9.0 Average of the Y’s = 7.5 Regression Line Y=3+0.5X R2 = 0.67 Std Error of the Slope = 0.118 Residual Sums of Squares = 13.75 7 12.00 10.00 8.00 6.00 4.00 y = 0.5001x + 3.0001 2.00 0.00 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 X 8 14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 X 9 14 12 10 8 6 4 2 0 0 5 10 15 20 X2 10 12.00 10.00 8.00 6.00 4.00 2.00 0.00 0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 X 11 Prolonged Acting Pro-Stuff An ulcer drug from the late 1960’s. In 1980 a change in a raw material resulted in more rejects. In-process control using a UV assay Composite of 5 tablets assayed 12 Prolonged Acting Pro-Stuff Sample from the top of each can Specs were 95% to 105% If value in spec, accept the can If value out of spec, reject the can Accepting and rejecting specific cans About 50% of the cans were rejected 13 Histogram of UV Assay 90 95 100 105 110 9 8 7 6 Frequency 5 4 3 2 1 0 90 93 96 99 102 105 108 UV Assay 14 Histogram of UV Assays 90 95 100 105 110 14 12 10 Frequency 8 6 4 2 0 80 85 90 95 100 105 110 UV Assays 15 Histogram of Retests 90 95 100 105 110 3.0 2.5 2.0 Frequency 1.5 1.0 0.5 0.0 88 92 96 100 104 108 Retests 16 Prolonged Acting Pro-Stuff No good cans or bad cans. Some “good” cans when retested are now out of specifications. The cans accepted are just as bad or good as the cans rejected. 45% of the values are OOS The product was taken off the market. A personal story 17 Shipping Decision 3 2.5 Number of Complaints 2 1.5 1 0.5 0 -0.5 -1 20 30 40 50 60 70 80 90 Outside Temperature 18 A Little Normal History The concept of the Normal is basic. Also called Gaussian or Bell Curve. First published in November 12, 1733. First set of tables in 1799 ! Used by the astronomer Laplace for errors. First called the Normal in 1893 by the statistician Karl Pearson. 19 They Were Blown Away “I know of scare anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the ‘Law of Frequency of Error.’” Francis Galton in Natural Inherence, 1888 20 Histogram of All Data Normal 80 85 90 95 100 105 110 115 18 Mean 95.98 StDev 4.787 16 N 77 14 12 Frequency 10 8 6 4 2 0 80 85 90 95 100 105 110 115 All Data 21 Hunting the Elusive Normal I have never met a real Normal distribution. Gotten close a couple of times. There are no real Normal distributions It’s a theoretical fiction that is useful part of the time. We must separate reality from theory. 22 “Normal Distribution” -6 +6 -3 +3 Mean 23 Normal Facts In theory, the tails of the distribution stretch from minus infinity to plus infinity, but there are real physical limits. It is unique in that it is fully described by just its mean, mu, , and its standard deviations, sigma, , which are almost never actually known for certain. Probabilities are represented by areas. 24 What’s Normally Normal? Tablet and capsule weights Most manufactured parts Student test scores, the ‘bell curve’ again Things that grow in nature: – Apples – Bird eggs – Flowers – Peoples heights 25 Ain’t Never Gonna be Normal Particle sizes LAL, EU/mL Bioburden, cfu/mL Failures of most anything Telephone calls per unit of time Church contributions Floods 26 Watch Out! The tails are the most volatile and unstable But, that is often the area of most interest! Difficult to tell if data are normally distributed by looking at a small sample. Crude rule is that we need at least 100 representative data values to determine if it is even approximately normal. 27 Statistical Significance: Who Cares ? The role of statistical analysis is as an additional tool to assist the scientist in making scientific interpretations and conclusions and not an end in itself. 28 Differences A scientific analysis often takes the form of looking for significant differences. Is drug A different from drug B? Is the increase in yield significantly better with the new centrifuge? A difference can be significant in two ways, practical and statistical. 29 Practical Significance Practical significance comes form comparing a difference to an absolute reference or absolute truth. How big a difference can you accept for: – Number of seconds of tooth pain? – Number of phone rings before hanging up? – How long will you wait for a bus? – How big your next raise is? 30 Statistical Significance Statistical significance testing is one of the great tools of statistics and science. Statistical significance comes from comparing a difference, a signal, to a relative reference of random variability or the best estimate of noise in the data. 31 Practical vs.Statistical Practical Significance always wins and takes precedence over statistical significance! In most applications, statistical significance should not be tested until practical significance is found. 32 Are The Analysts Different? Sam Barb 98.2 100.2 99.3 100.5 99.7 100.8 Xbar=99.1 Xbar=100.5 Spec= 90.0 to 110.0 Two Sided t, P=0.04 33 Signal to Noise All statistical significance testing is only a comparison of the signal to the noise. If the signal can be shown to be larger than the noise, than we would expect by chance variation alone, we say it is significant. Bigger signal more significant. Smaller noise more significant. 34 Significance? Practical / NO YES Statistical Nothing going 1. May be due to NO on here it chance. seems. 2. May need more data. 1. Small noise Great! YES 2. Large sample size. Everybody is What does it mean? happy. 35 Why Do It To It? The primary purpose of statistical tests of significance is to prevent a us from accepting an apparent result as real when it could be just due to random chance. Statistical significance without practical significance could in some circumstances be a lead to finding new relationships. What if the spec was changed to 98.0 to 102.0? We may want to find out why different 36 The Biggest Lie in Statistics? Your statistics professor mislead or lied. Is Xbar±3S ever Correct? For ever complex problem there is a solution that is quick, simple, understandable and absolutely wrong! More grief has been perpetuated by this formula than any in statistics. 37 The Biggest Lie in Statistics? What is true is that 3 will bracket 99.73% of the area under the normal cures. Note that this assumes we know the true values for the mean mu, , and standard deviation, sigma, , which we never do of course. We have to estimate them with the small samples we take. Thus, there is uncertainty in the estimates. 38 Side Line Did you hear about the statistician’s wife who said her husband was just average? She was being mean. 39 So, What Do I Do Now? Don’t use Xbar±3S as generalized monkey wrench and apply it to all of your statistical questions. Use the right tool for the job. Use Confidence Intervals to bracket the unknown mean. Use Tolerance Intervals to bracket a given percentage of the individual data values. 40 %RSD: Friend or Foe? S= SQRT[(X-Xbar)2/(n-1)] %RSD = (100 * S) / Xbar They are two different summary statistics They measure two different concepts They are not substitutes for each other We need to report both. 41 Control Charts Having just told you not to use Xbar±3S, I now have to tell you that is how control charts define the control limits. This is an artifact of history. Control charts were developed by Dr. Walter Shewhart in 1924 while working at Western Electric in Cicero Ill. 42 Control Chart I and MR Chart for Yield % Add Xbar 3S 103.5 limits to a line UCL=103 Individual Value 102.5 101.5 plot. 100.5 Mean=100 99.5 A chart for the 98.5 97.5 LCL=97 response. Subgroup 96.5 0 50 100 A chart for the 4 UCL=3.686 moving range to Moving Range 3 estimate 2 R=1.128 1 variability. 0 LCL=0 43 Do You Trust Your Control Chart? Control charts are crude tools and not exact probability statements. They don’t take into account the number of samples in the data set for the limits. They are intended as early warning devices and not accept/reject decision tools. Don’t use for large $$ decisions. 44 Oh Wow, I Don’t Believe It ! You did what to set the specification criteria for your million dollar product? 45 Setting Specifications A specification is a document that contains methods and accept/reject criteria Criteria can be determined several ways – Wishful thinking – Clinical results – Compendial standards – Historical data and statistics 46 Million $$ Decisions? Regulatory Limits - External Release: accept/reject - Internal Action limits Alert – Warning limits – Trend limits – Validation limits 47 Idealized Specification Limits Alert Action / Accept Reject Regulatory 48 Calculating Criteria Don’t use Confidence Intervals, they shrink toward zero with large sample sizes. Don’t use X bar ± 3 S. They are too narrow for small sample sizes Use Tolerance Intervals, preferably 99%/99%. This will take into consideration the sample size and uncertainty of the average and the standard deviation. 49 Setting Specification Criteria For action limits, expect the average to vary and widen the Tolerance Limits For accept/reject limits, add a further allowance for stability. Consider the clinical results when possible as part of the justification for limits. 50 Drunken Teachers Did you know that there is a positive correlation between alcohol consumption and High School teacher’s salaries? That there is a negative correlation between average student’s test scores for a state and the distance of the state capital from the Canadian boarder? 51 Cow Magnets Cure Gout What’s a cow magnet? What is gout? How do we test a cause and effect relationship to see if this works? Should we just ask people what they think? “No causation without manipulation.” Gold Standard is double blind clinical trial. 52 Variability is the Enemy How many OOS values were documented in the lab last year? How many manufacturing deviations were investigated last year? How many lots were rejected last year? How many of your quality problems would go away if there were no variation? 53 Misconceptions of variability We have variability because the equipment needs to be replaced with new technology. We do too many tests. Variability exists because some idiot didn’t do their job correctly. Variability is an inherent fact of life and there isn’t a darn thing we can do about it except to live with it. It’s cost of business. 54 Variability is the Enemy “Special Cause” variation is the result of a single source. Use CAPA to solve it. “Common Cause” variation is the result of multiple small sources all contributing to the sum total. CAPA will not work for common cause We need a culture change to address common cause variation 55 Sources of Variation: Common cause variation: – People – Materials – Methods – Measurement – Machines – Environment 56 Common vs. Special Causes A plot of the data I Chart for Yield% with X bar ± 3 S 106 illustrates common 105 1 cause variation. 104 Individual Value 103 UCL=103 A value that is 102 larger than would 101 be expected by 100 Mean=100 99 chance alone is 98 assumed to be due 97 LCL=97 to a special cause. 96 0 50 100 Observation Number 57 Deming’s Message Dr. W. Edwards Deming was the very famous statistician that taught statistical quality control to the Japanese in the 50’s. “If I had to reduce my message for management to just a few words, I’d say it all had to do with reducing variation.” 58 Deming’s Message If you reduce variability, you will reduce scrap, rejects and rework. You can then make a better product at less cost. You will capture a larger market share. Your people will be employed and you will prosper. • Paraphrase of Deming’s message 59 Confronting the Enemy Operational Definitions Achieve the Target Flexible Consistency Hold Constant Controllable Factors Mistake Proofing New Technology Continuous and forever improvement 60 The Black Hole of Quality Like a black hole with light, sampling plans just suck the common sense right out of people’s brains. Normal, logical and rational people suddenly become willfully and terminally stupid. Many myths and misconceptions about what sampling plans can and can not do. 61 Black Hole Facts A sample is only a small part of the whole Each sample is going to be different Some samples will have many defects Some samples will have few defects Bigger sample, better estimate. On average, the defect percent can only be estimated and not known perfectly. 62 Black Hole Facts There is a small but real probability that a good lot of product will be rejected. Called the “Producer’s Risk, usually 5%. There is a small but real probability that a bad lot will be accepted. “Consumer’s Risk, usually 5% or 10% Most common plan is ANSI/ASQ Z1.4. 63 Black Hole Facts “The AQL is the quality level that is the worst tolerable process average … .” “The acceptance of a lot is not intended to provide information about lot quality.” “The standard is not intended as a procedure for estimating lot quality or for segregating lots.” 64 Black Hole Facts “The purpose of this standard is, through the economic and psychological pressure of lot non-acceptance, to induce a supplier to maintain a process average at least as good as the specified AQL while at the same time providing an upper limit on the consideration of the consumer’s risk of accepting occasional poor lots.” 65 Misunderstandings Double and multiple sampling plans are not testing into compliance. It is not possible to have an AQL=0.0 Accept on zero, reject on one is not always the best plan for critical defects. If the lot size is ten times or more than the sample size, then the lot size doesn’t matter. 66 Summary “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” H. G. Wells 67 References NIST online statistics textbook – http://www.itl.nist.gov/div898/handbook/inde x.htm Edward Tufte’s website – http://www.edwardtufte.com/tufte/ W. Edwards Deming’s book – Out of the Crisis 68 References Torbeck, Lynn.,Using Statistics to Measure and Improve Quality, DHI Publishing 2004. De Muth, James (1999). Basic Statistics and Pharmaceutical Statistical Applications, Marcel Dekker. 69 “That’s All Folks” Thank you ! Questions ? 70