VIEWS: 98 PAGES: 8 CATEGORY: Education POSTED ON: 9/9/2009 Public Domain
_____________________________________________________________________________ University of Illinois at Chicago _____________________________________________________________________________ IDS/IE 571 Statistical Quality Control and Assurance Instructor Stan Sclove _____________________________________________________________________________ A MEDICAL LAB QUALITY CONTROL PROBLEM Stanley L. Sclove, Ph.D. Submitted for publication Abstract A statistical analysis of white blood cell counts, done both by machine and manually, is performed. The results are applied to the problem of finding the lower limit of machine accuracy for the counts. The Lab Manager's problem of deciding when to repeat a machine analysis by costlier manual analysis is treated as a problem in Decision Analysis. Machine vs. Manual White Blood Cell Counts Commercially manufactured machines are available to do much of the routine work of the medical laboratory. Some calibration results are provided by manufacturers, but often a Lab Manager may feel the need to check these. Also, there is a need to calibrate such machines for the individual lab. There can be slight local variations that must be taken into account. These might be due to variations in relative humidity, electric supply, or other physical circumstances. This example reports (Sclove 1994) a study at a local hospital to calibrate a machine which does white blood cell counts. Machine results were compared with those obtained by laboratory technicians. It is much cheaper to use the machine, but, for low counts, the machine does not give reliable results. The question is, what is the lower limit of machine reliability? The Lab Manager will use a rule: If the machine result is less than some number L, the analysis must be repeated manually. The problem is to choose the number L. It is desirable to use the machine for as many tests as possible, without a repeat, manual analysis. Repeat analyses are not billable, an analysis takes about twenty minutes, and lab technicians currently earn over twenty dollars an hour. The cost of a single repeat test may not seem large, but it adds up over a large number of tests, and such labs are very busy. The Data The data are plotted in Figure 1; the logs (base 10) are plotted in Figure 2. - - * 21.0+ - MACHINE - * - - ** 14.0+ * ** - * * - * * - ** - * * ** 7.0+ * **2 ** - ***5* - ** 2 ** - * *3** - * 2 0.0+ 84 +---------+---------+---------+---------+---------+------ 0.0 4.0 8.0 12.0 16.0 20.0 MANUAL Figure 1. Machine vs. Manual WBCs - * logMACH - * * - *5* - **32 0.80+ *2633 - * * 2 - *32 * - * * * - ** 0.00+ - * * - * * * 2 * ** - * - * -0.80+ - * - - ----+---------+---------+---------+---------+---------+-- -2.10 -1.40 -0.70 0.00 0.70 1.40 log MAN Figure 2. log Machine vs log Manual WBCs Note that the log-log plot is preferable since the data vary over several orders of magnitude. (There are counts on the order of 0.1, 1.0 and 10.0.) In fact, from the log-log plot it appears that linearity fails where the log count is about -0.3, i.e., where the count is about equal to the antilog of -0.3, or about 0.5. This estimate of L should be confirmed by more formal analyses. Segmented regression, with two regimes with different error variances, could be used. The number L is the change point between the two regimes. An alternative approach is simply to cluster into two groups (using a model with different covariance matrices for the two groups) and see where the dividing point is. Results of the cluster analysis A mixture-model cluster analysis was performed on the logs of the white blood cell count (WBC). The sample consisted of pairs (log machine WBC, log manual WBC) for n = 61 patients. Computer programs written by Sclove (1992) were used. Results are in Table 1 and Figure 3. TABLE 1. Results of mixture-model cluster analysis for the WBC data NUMBERS ALLOCATED TO CLUSTERS: 6 55 Estimates of parameters of the mixture model: MIXING PROBABILITIES: 0.11 0.89 MEAN VECTORS: log(machine) log(manual) POPULATION 1 -0.54985 -1.49552 POPULATION 2 0.65584 0.64212 COVARIANCE MATRICES: POPULATION 1 0.05006 0.02122 correlation = +.28 0.02122 0.11113 square of correlation = .08 POPULATION 2 0.19049 0.20232 correlation = +.98 0.20232 0.22399 square of correlation = .96 --------------------------------------------------------------------- TABLE 2. Estimates of parameters of distribution of log(machine WBC) --------------------------------------------------------------------- Population: 1 2 Mixing probabilities: .11 .89 Means: -0.55 +0.66 Standard deviations: 0.22 0.44 --------------------------------------------------------------------- - B logMACH - B B - B5B - BB32 0.80+ B2633 - B B 2 - B32 B - B B B - BB 0.00+ - B B - A A A 2 B BB - A - A -0.80+ - A - - ----+---------+---------+---------+---------+---------+-- -2.10 -1.40 -0.70 0.00 0.70 1.40 log MAN Figure 3. log Machine vs log Manual WBCs: Clusters Counts appear where different cases coincide. The "2" near the boundary consists of two "B"s. The correlation between the two variables is .98 in Population 2 and only .28 in Population 1. If there were higher correlation between the two measurements at the lower level (Population 1), one could be calibrated against the other; the problem is that such correlation is absent. The Lab Manager's rule will be implemented. For this, L must be estimated. Determining the change point L Since the Lab Manager's rule is based on the machine result, the parameter L is estimated from the estimated marginal distribution of log(machine WBC). For brevity now let x denote log(machine WBC). The marginal distribution is a mixture of univariate normal distributions, estimates of the parameters of which are given in Table 2, obtained from Table 1. Let C(i|j) be the cost of making decision i when the true state of Nature is j, i = 1,2, j = 1,2. Here 1 denotes "Machine result unreliable" and 2 denotes "Machine result reliable." The posterior expected cost of Saying 1, given x, is E[Say 1|x] = C(1|1)Pr(1|x) + C(1|2)Pr(2|x). The posterior expected cost of Saying 2, given x, is E[Say 2|x] = C(2|1)Pr(1|x) + C(2|2)Pr(2|x). The minimum posterior expected cost rule is to say 1 for values of x such that E[Say 1|x] < E[Say 2|x]. This is equivalent to Pr(1|x)/Pr(2|x) > A, where A = [C(1|2)-C(2|2)]/[C(2|1)-C(1|1)]. The constant A is the ratio of "regrets" C(1|2)-C(2|2) and C(2|1)-C(1|1). The estimates of the posterior probabilities Pr(1|x) and Pr(2|x) are tabulated over a limited range in Table 3. When C(1|2)-C(2|2) = C(1|1)-C(2|1), e.g., when C(1|1) = 0 = C(2|2) and C(1|2) = C(2|1), the constant A equals 1, and minimum posterior expected cost classification reduces to maximum posterior probability classification. In this case L is estimated as the point where the posterior probabilities are equal. This is seen from Table 3 to be at about x = log(machine WBC) = -0.27 or machine WBC = +0.54. TABLE 3. Posterior probabilities for Machine vs. Manual WBCs --------------------------------------------------------- x = antilog(x)= log(machine WBC) machine WBC Pr(1|x) Pr(2|x) --------------------------------------------------------- -0.50 0.32 0.89 0.11 -0.49 0.32 0.88 0.12 -0.48 0.33 0.87 0.13 -0.47 0.34 0.86 0.14 -0.46 0.35 0.85 0.15 -0.45 0.35 0.84 0.16 -0.44 0.36 0.83 0.17 -0.43 0.37 0.82 0.18 -0.42 0.38 0.81 0.19 -0.41 0.39 0.80 0.20 -0.40 0.40 0.78 0.22 -0.39 0.41 0.77 0.23 -0.38 0.42 0.75 0.25 -0.37 0.43 0.73 0.27 -0.36 0.44 0.72 0.28 -0.35 0.45 0.70 0.30 -0.34 0.46 0.68 0.32 -0.33 0.47 0.66 0.34 -0.32 0.48 0.63 0.37 -0.31 0.49 0.61 0.39 -0.30 0.50 0.59 0.41 -0.29 0.51 0.56 0.44 -0.28 0.52 0.54 0.46 -0.27 0.54 0.51 0.49 -0.26 0.55 0.48 0.52 -0.25 0.56 0.46 0.54 -0.24 0.58 0.43 0.57 -0.23 0.59 0.41 0.59 -0.22 0.60 0.38 0.62 -0.21 0.62 0.35 0.65 -0.20 0.63 0.33 0.67 -0.19 0.65 0.30 0.70 -0.18 0.66 0.28 0.72 -0.17 0.68 0.25 0.75 -0.16 0.69 0.23 0.77 -0.15 0.71 0.21 0.79 -0.14 0.72 0.19 0.81 -0.13 0.74 0.17 0.83 -0.12 0.76 0.16 0.84 -0.11 0.78 0.14 0.86 -0.10 0.79 0.13 0.87 ----------------------------------------------------------- Analytical solution for L In terms of the component probability density functions f(x|1) and f(x|2) and the mixing probabilities p(1) and p(2), the classification rule Pr(1|x)/Pr(2|x) > A, is f(x|1)/f(x|2) > A p(2)/p(1) = B, say. In the present case, with Gaussian distributions with means m1, m2 and variances v1 and v2, this is equivalent to (x - m1)2 /v1 < (x - m2)2 /v2 - C, where C = ln(v1/v2) + 2 ln(B) . In applying this, one uses the estimates of p(1) and p(2) and the parameters of f(x|1) and f(x|2). This leads to a quadratic in x; one root will be the solution sought (L); the other root will be extraneous -- outside the physically feasible range for x. Practical solution for L The Lab Manager chose machine WBC = 1 as the cut-off point, i.e., log(machine WBC) = L = log 1 = 0. This is consistent with a realistically high cost of saying a result is reliable when it is not. SUMMARY Decision analysis, linked with a statistical analysis involving clustering, was applied to a problem of determining a policy in a medical lab. REFERENCES Sclove, Stanley L. (1992). CLUSPAC: Computer Programs for Mixture-Model Cluster Analysis. CRIM Working Paper No. 92-2, January, 1992. Center for Research in Information Management, University of Illinois at Chicago. Sclove, Stanley L. (1994) "Use of Mixture-Model Clustering in Determining the Limit of Accuracy of a Medical Lab Instrument." ORSA/TIMS Meeting, Detroit, Oct. 23-26, 1994, and Risk Analysis Section General Meeting, 1995 Joint Statistical Meetings, American Statistical Association and Related Societies, Orlando