Document Sample

IEEE TRANSACTIONS ON RELIABILITY, VOL. 53, NO. 1, MARCH 2004 11 Parametric Inference of Incomplete Data With Competing Risks Among Several Groups Chanseok Park and K. B. Kulasekera Abstract—We develop parametric inferential methods for I. INTRODUCTION the competing risks problem where data arise due to multiple causes of failure in several groups with censoring and possibly missing causes. We provide the general likelihood method and the closed-form maximum-likelihood estimators for the exponential I N MANY engineering and medical studies, lifetime distri- butions of items or individuals are of interest. Experimenters use various indexes associated with lifetime distributions to model. Parametric tests are given for comparing different causes evaluate systems. Typically, lifetime measurements are taken and groups. An extensive numerical and graphical investigation is presented to substantiate the proposed methods. A real-data from the relevant population(s) and statistical inferences on the example is illustrated. corresponding indexes are conducted. Often the items or indi- Index Terms—Censored data, competing risks, exponential dis- viduals fail due to more than one failure mechanism, commonly tribution, maximum likelihood, missing cause. referred to as competing risks. In this setting, usually the cause of failure is known when the lifetime is observed. Furthermore, the items or individuals may also be grouped according to some ACRONYMS1 criteria so that one has observations from multiple populations implies: statistical(ly) where each observation is due to one of the failure mechanisms. cdf cumulative distribution function For example, one may wish to study the effect of the brand of pdf probability density function air-conditioning systems which can fail either due to leaks of CIF cumulative incidence function refrigerant or wear of drive belts. In these situations, an item CSHF cause-specific hazard function of interest would be to know whether there are significant MLE maximum-likelihood estimator differences due to brand(s) and the cause of failure. A typical MSE mean square error lifetime data analysis problem of the above type is further MVN multivariate -normally distributed complicated due to possible censoring and unknown cause SRE simulated relative efficiency of failure. In the example above, the experimenter fails to observe the exact lifetime of an air-conditioning system if the NOTATION building or facility is destroyed or renovated thus discarding lifetime of the th subject in the th group due to the all functioning systems. In such a situation one observes a th cause right-censored lifetime value. In some situations, the exact censoring indicator variable cause of failure may not be observed although the lifetime is vector parameters of the distribution of observed, thus masking the cause. We formally formulate the parameter matrix of problem in the following required notation. The traditional approach when dealing with competing risks pdf of is to consider the hypothetical latent lifetimes corresponding to cdf of each cause in the absence of the others [1]. Therefore, a subject survival function of in the th group, , is exposed to several potential hazard function of causes of failure. Let there be a finite number of -independent likelihood function without missing causes causes of failure indexed by . Let denote the likelihood function with missing causes latent lifetime of the th subject in the th group due to the th cause, where and . It is assumed Indicator function that are -independent for all and are identically Fisher information matrix distributed for all for a given . The corresponding cdf, pdf, survival function, and hazard function of are denoted Manuscript received March 7, 2003; revised June 22, 2003 and November in general by , , , 20, 2003. The work of C. Park was supported in part by Clemson RGC award. The work of K. B. Kulasekera was supported in part by NIH grant R01 CA and , respectively, where is a vector of real 92504-02. valued parameters, one for each . Then the observed life- The authors are with the Department of Mathematical Sciences, Clemson University, Clemson, SC 29634 USA (e-mail: cspark@ces.clemson.edu; time of the th subject in the th group is given by the random kk@ces.clemson.edu). variable Digital Object Identifier 10.1109/TR.2003.821946 1 The singular and plural of an acronym are always spelled the same. 0018-9529/04$20.00 © 2004 IEEE 12 IEEE TRANSACTIONS ON RELIABILITY, VOL. 53, NO. 1, MARCH 2004 Typically, in reliability analysis problems, complete obser- statistics for testing the above hypotheses. Our methods are the vation of may not be possible due to various censoring first to address the competing risks failure data in several groups schemes which are inherent in data collection processes. In this with censoring and missing causes together. work, it is further assumed that each can be right-censored When it is inappropriate or undesirable to assume a specific by censoring times which are -independent of lifetimes parametric form and -independence in the competing risks for all and . Thus, one only observes , problem, one can use distribution-free methods. There is ex- where , and are censoring in- tensive literature on nonparametric estimation and testing. Gray dicator variables defined as [8] proposed a class of -sample tests for comparing the CIF between groups for a given cause. Aly [9] proposed tests for testing the equality of two CSHF with censoring. Sun and Ti- if wari [10] provided a simple method of testing for comparing (1) if . the CIF. Lam [11] tests whether the CSHF are the same when there are -dependence between multiple causes. Recently, Ku- Our objectives are: lathinal and Gasbarra [12] extended the work of Lindkvist and 1) Estimate the vector parameters of the distribution Belyaev [13] and Luo and Turnbull [14] by looking at compar- of . ison of groups. 2) Perform the following tests of hypotheses: We provide the general likelihood method in Section II. • : for all , Parameter estimation, asymptotic distributions and hypothesis • : for all for a given , and testing for the exponential model are handled in Section III. • : for all for a given . A real-data example is illustrated in Section IV and some simulation results in Section V. Here, the alternative hypothesis in each case is taken to be the negation of the statement given in the null hypothesis. The first null states that there is no cause or group effect; the second stip- II. LIKELIHOOD CONSTRUCTION ulates that there is no group effect for a given cause; and the In this section, we develop the likelihood functions of the third stipulates that there is no cause effect in a given group. parameters of the underlying distributions . Let be The analysis of exponential data with two causes in a single the indicator function of an event . For convenience, denote group was studied by Cox [2]; and was extended to multiple , and causes by Herman and Patell [3]. The parametric estimation problem for the case of a single group with two causes and possible missing causes but without censoring has been dis- cussed by Miyakawa [4]. Kundu and Basu [5] extended this . . .. . . work to provide the approximate and asymptotic properties of . . . the parameter estimators, -confidence intervals, and bootstrap -confidence bounds. They provided the closed-form MLE for the exponential case and constructed likelihood equations for The likelihood function of the censored sample is the Weibull case. However, they did not consider censoring. Also, although they stated that their solutions extend to the mul- tiple cause case, no explicit expressions were provided. An alternative to the traditional latent lifetime framework is the parametric mixture-model approach adopted by Larson and Dinse [6]. An attractive feature of this approach is that it re- laxes the -independence assumption of the latent lifetime ap- proach. However, the drawback is that finding the MLE in the parametric mixture model is quite difficult; and requires intense and often unstable numerical calculations. Recently, Maller and Zhou [7] extended this model to allow the possibility that the competing risks considered may not be exhaustive. In this article, we give the closed-form MLE for the exponen- tial model with multiple groups, multiple causes, censored data, and missing causes together. The proposed estimators include the estimators given by Kundu and Basu [5] as a special case. For the Weibull model, when the shape parameter is common for all groups and causes, using the proposed estimators, one can easily obtain the closed-form MLE for scale parameters after the common shape parameter is estimated by the likeli- hood method. These lead to the construction of reasonable test PARK AND KULASEKERA: PARAMETRIC INFERENCE OF INCOMPLETE DATA 13 where A. Closed-Form MLE We assume that is an exponential random variable with the rate parameter . The pdf of is Maximizing with respect to is equivalent to separately Then the pdf of is obtained by maximizing with respect to for and . Thus we have reduced the joint maximum- likelihood problem for the parameters of distributions to separate estimation problems for the parameters of each distribution. where . The likelihood function of is When lifetimes of subjects are observed with causes of failure being unknown (missing), we have to add the sub-density func- tions of the time for each th cause into the likelihood func- tion. The CIF for each th cause is Then we have the corresponding sub-density function where . With , the log-likelihood function becomes Therefore the pdf of is given by Denote if cause of failure is unknown. Then the likelihood is given by Define for . Then we have where and Because , we have Maximizing with respect to is equivalent to maxi- mizing with respect to for . Thus (2) we have reduced the joint maximum-likelihood problem for the parameters of distributions to separate estimation prob- lems for the parameters of each distribution. The MLE of are obtained by solving the following: III. EXPONENTIAL MODEL In this section, we provide the closed-form MLE for the ex- (3) ponential model and the asymptotic distribution of the proposed From the above, we have MLE. Parametric tests are also given for comparing different causes and groups. 14 IEEE TRANSACTIONS ON RELIABILITY, VOL. 53, NO. 1, MARCH 2004 Fig. 1. Empirical CIF with 95% s -confidence interval. (a) Cause 1 in Group I. (b) Cause 2 in Group I. (c) Cause 1 in Group II. (d) Cause 2 in Group II. (e) Cause 1 in Group III. (f) Cause 2 in Group III. where . It follows that B. Asymptotic Distributions of the MLE We provide the asymptotic distribution of the proposed MLE for the exponential model. Using this result, the asymptotic -confidence intervals can be obtained. First we Substituting this into (3), we have the following closed-form obtain the Fisher information matrix of the parameters MLE which is denoted by PARK AND KULASEKERA: PARAMETRIC INFERENCE OF INCOMPLETE DATA 15 for and . Then we obtain TABLE I PARAMETER ESTIMATES WITHOUT ANY RESTRICTIONS and and for all . TABLE II Because has a binomial distribution PARAMETER ESTIMATES UNDER H ,H , AND H for , we have . Then we have and and for all . Using this, we have the following partitioned Fisher information matrix of the parameters TABLE III TEST STATISTICS AND CRITICAL VALUES .. . .. where . Here, . where and where . In practice, we usually . Note that whenever . Hence we have estimate by . Using this, we can find approximate -confidence intervals for by taking to be MVN with the mean and the covariance matrix . .. . C. Hypothesis Testing Here, we provide a method for hypothesis testing based on The matrix is diagonally partitioned, so its inverse is the maximum-likelihood method. A likelihood ratio statistic by given by Neyman and Pearson is where . Using Theorem 8.3.3 in Graybill It is well-known that under the null hypothesis [15] and doing some algebra, we have where the degrees of freedom of the limiting distribution is the difference between the number of free parameters under the null hypothesis and the number of free parameters under the al- This inverse always exists unless ( i.e.,, each subject ternative . We develop hypothesis tests for each of the fol- in the th group is observed only with censoring or missing lowing: causes) but this condition is extremely unrealistic in practice. • : for all , Then we have the following asymptotic distribution of • : for all for a given , and from the property of MLE • : for all for a given . Here, the alternative hypothesis in each case is taken to be the negation of the statement given in the null hypothesis. 16 IEEE TRANSACTIONS ON RELIABILITY, VOL. 53, NO. 1, MARCH 2004 TABLE IV FAILURE TIMES AND CAUSES FOR 139 ELECTRICAL APPLIANCES Here, we provide the MLE for the exponential model under The following test statistics , , are used to test the each of the above null hypotheses. null hypotheses , , , respectively: • Under : We have restrictions on scale parameters for all . Using these and (2), we obtain the MLE of and : where , , , and . Here, , , and have asymptotic distributions with the degrees of freedom , , • Under : and , respectively. We have restrictions on scale parameters for all for a given . Using these and (2), we obtain the It is worth noting that is obtained by taking the average of MLE of : with respect to ; is obtained by taking the weighted harmonic mean of with respect to with the weights ; and is obtained by taking the average of with respect to . IV. EXAMPLE • Under : The data in this example were first presented by Nelson [16] We have restrictions on scale parameters and have since then been used frequently for illustration in com- for all for a given . Using these and (2), we obtain the peting risks literature including Crowder [17] and Lawless [18]. MLE of : The data consist of failure or censoring times for 139 appliances (36 in Group I, 51 in Group II, and 52 in Group III) subjected to a manual lifetime test. Failures were classified into 18 different modes. Among the 67 observed failures in Groups I, II, and III, only mode 11 appears more than twice in all three groups. We PARK AND KULASEKERA: PARAMETRIC INFERENCE OF INCOMPLETE DATA 17 TABLE V THE BIAS AND THE MSE OF THE ESTIMATORS Fig. 2. (a) S vs. (8); (b) S vs. (6); and (c) S vs. (6). shall focus on failure mode 11 by coding the causes as follows: able. We also can consider some other methods for this model (mode 11), (other modes), and (censored). validity. For formal goodness-of-fit tests for exponentiality, the We provide the data in Table IV. reader is referred to Spurrier [20] and Akritas [21]. For the exponential distribution, the cumulative hazard func- We estimated the rate parameters under the exponential tion is . Therefore, when the empirical cumulative model without any restrictions, and under the null hypotheses hazard function is plotted against , the resultant graph , , . The estimates without any restrictions are should give an approximately straight line passing through the shown in Table I; and the estimates under the null hypotheses origin for an exponential model. This is a common graphical , , are shown in Table II. We also tested the null hy- technique for checking exponentiality. However, with competing potheses , , . Table III shows these results denoted risks, we have to check the validity of a specific model for each by , , under , , , respectively; and also cause. One way to do this is to compare the empirical CIF with the provides asymptotic critical values at the -significance level parametric CIF, where the CIF of the exponential model is given of . The results indicate that the null hypotheses by and should be rejected at the -significance level of . There is evidence of a cause effect in at least one group. Fig. 1 shows the empirical CIF based on Aalen [19] with point- V. SIMULATION RESULTS wise approximate 95% -confidence limits To evaluate the performance of the proposed estimators and tests, an extensive simulation study was carried out using lan- guage [22]. We generated the data from -independent exponential dis- The parametric CIF with the MLE are also superimposed on the tributions. Let denote the lifetime of the th subject in plot and it is seen that they lie reasonably well within -confi- the th group due to the th cause according to the exponential dence bands. This indicates that an exponential model is reason- distribution with the parameter . Then are 18 IEEE TRANSACTIONS ON RELIABILITY, VOL. 53, NO. 1, MARCH 2004 Fig. 3. (a) S vs. (8); (b) S vs. (6); and (c) S vs. (6). Fig. 4. (a) S vs. (8); (b) S vs. (6); and (c) S vs. (6). TABLE VI OBSERVED LEVELS OF THE TESTS TABLE VII OBSERVED POWERS OF THE TESTS given by , where A. Parameter Estimation are censoring times and are censoring indicator variables We considered the case of and using the defined as (1). Denote , following parameter matrices , and . We censored the lifetimes using an exponential sample with the rate under consideration. PARK AND KULASEKERA: PARAMETRIC INFERENCE OF INCOMPLETE DATA 19 Fig. 5. The observed powers P (), P ( ), P ( ) using the test statistics S , S , S , respectively. 3 ( ) is used for (a) and (b); 3 ( ) for (c) and (d); and 3 () for (e) and (f). We generated random samples of sizes for In Table V, we have presented the bias and the MSE of the the first and second groups; and censored the lifetimes using an estimators of with 5000 iterations. To help compare the exponential sample with the rate 2.5. Then we masked the MSE, we also find the SRE which is defined as causes of failure with different missing percentages of causes, 0% (no missing causes), 10%, 20%, and 30%. 20 IEEE TRANSACTIONS ON RELIABILITY, VOL. 53, NO. 1, MARCH 2004 To examine the impact of missing causes on the parameter Therefore, this statistic does not change when the lifetimes estimation, we compare the estimates with the complete data with missing causes are deleted. Table VII displays the ob- set without missing causes (denoted by ), the data set with served powers of the test statistics , , using , missing causes , and the data set after deleting the lifetimes , with , respectively. The table shows with missing causes . The results in Table V indicate that deleting the data with missing causes results in a loss of that the proposed method outperforms the ad hoc method of power. To make the above observations graphically explicit, we deleting the lifetimes with missing causes. can change the value of and plot the observed powers of the three data sets—the complete data set (0% missing), the data B. Hypothesis Tests set including missing causes, and the data set after deleting the data with missing causes—on the same graph. This graph is We considered the case of and using the displayed in Fig. 5 over a fine grid of from 0.6 to 1.5. following parameter matrices with . 1) , where ACKNOWLEDGMENT Dr. Park is grateful to Dr. M. Leeds for his help and encour- agement throughout this research. The authors thank the ref- erees for their useful suggestions. 2) , where REFERENCES [1] M. L. Moeshberger and H. A. David, “Life tests under competing causes of failure and the theory of competing risks,” Biometrics, vol. 27, pp. 909–933, 1971. [2] D. R. Cox, “The analysis of exponentially distributed lifetimes with two types of failures,” J. Royal Stat. Soc. B, vol. 21, pp. 411–421, 1959. 3) , where [3] R. J. Herman and R. K. N. Patell, “Maximum likelihood estimation for multi-risk model,” Technometrics, vol. 13, pp. 385–396, 1971. [4] M. Miyakawa, “Analysis of incomplete data in competing risks model,” IEEE Trans. Rel., vol. 33, pp. 293–296, 1984. [5] D. Kundu and S. Basu, “Analysis of incomplete data in presence of com- peting risks,” J. Statistical Planning and Inference, vol. 87, pp. 221–239, 2000. Notice that , , and with give the data [6] M. G. Larson and G. E. Dinse, “A mixture model for the regression anal- under null hypotheses , , and , respectively. ysis of competing risks data,” Appl. Stat., vol. 34, pp. 201–211, 1985. In Figs. 2–4, we present the probability plots [23] of the test [7] R. A. Maller and X. Zhou, “Analysis of parametric models for competing risks,” Statistica Sinica, vol. 12, pp. 725–750, 2002. statistics , , and against the quantiles of the distribu- [8] R. J. Gray, “A class of k -sample tests for comparing the cumulative inci- tions. We generated random samples of sizes dence of a competing risk,” Annals of Statistics, vol. 16, pp. 1140–1154, with 50 iterations for the first, second and third groups; and 1988. [9] E.-E. A. A. Aly, S. C. Kochar, and I. W. McKeague, “Some tests for com- censored the lifetimes using an exponential sample with paring cumulative incidence functions and cause-specific hazard rates,” the rate 1. We used , , and in Figs. 2–4, re- J. Amer. Stat. Association, vol. 89, pp. 994–999, 1994. spectively. With , all three test statistics have asymptotic [10] Y. Sun and R. C. Tiwari, “Comparing cumulative incidence functions of a competing-risks model,” IEEE Trans. Rel., vol. 46, pp. 247–253, 1997. distributions, and the figure shows a fairly good asymptotic [11] K. F. Lam, “A class of tests for the equality of k cause-specific hazard result. With , only has an asymptotic distribution. rates in a competing risks model,” Biometrika, vol. 85, pp. 179–188, With , only has an asymptotic distribution. 1998. [12] S. B. Kulathinal and D. Gasbarra, “Testing equality of cause-specific Next, we turn our attention to the levels and the powers of hazard rates corresponding to m competing risks among k groups,” Life- each hypothesis test. We generated random samples of sizes time Data Analysis, vol. 8, pp. 147–161, 2002. for the first, second, and third groups; [13] H. Lindkvist and Y. Belyaev, “A class of nonparametric tests in the com- peting risks model for comparing two samples,” Scandinavian Journal and censored the lifetimes using an exponential sample of Statistics, vol. 25, pp. 143–150, 1998. with the rate 1. Then we masked the causes of failure with dif- [14] X. Luo and B. W. Turnbull, “Comparing two treatments with multiple ferent missing percentages of causes, 0% (no missing causes), competing risks endpoints,” Statistica Sinica, vol. 9, pp. 986–998, 1999. [15] F. A. Graybill, Matrices With Applications in Statistics: Wadsworth, Inc., 10%, 20%, 30%, and 40%. The simulation results are based on 1983. iterations. Table VI gives the observed levels of the test [16] W. Nelson, “Hazard plotting methods for analysis of life data with dif- statistics , , using , , ferent failure modes,” J. Qual. Technol., vol. 2, pp. 126–149, 1970. [17] M. J. Crowder, Classical Competing Risks: Chapman and Hall, 2001. , respectively. Here, and denote the data sets including [18] J. F. Lawless, Statistical Models and Methods for Lifetime Data, 2nd missing-cause lifetimes and excluding missing-cause lifetimes, ed. New York: John Wiley & Sons, 2003. respectively. The table shows that the observed levels of the test [19] O. O. Aalen, “Nonparametric estimation of partial transition probabil- ities in multiple decrement models,” Annals of Statistics, vol. 6, pp. statistics and associated with the data set are, in general, 534–545, 1978. closer to the nominal level of 0.05 than those associated with the [20] J. D. Spurrier, “An overview of tests for exponentiality,” Communica- data set . It is interesting to note that the observed levels of the tions in Statistics, Part A—Theory and Methods, vol. 13, pp. 1635–1654, 1984. test statistic are the same regardless of which data set is used. [21] M. G. Akritas, “Pearson-type goodness-of-fit tests: The univariate case,” This is because it is shown that J. Amer. Stat. Association, vol. 83, pp. 222–230, 1988. [22] R. Ihaka and R. Gentleman, “R: a language for data analysis and graphics,” J. Comput. Graphical Stat., vol. 5, pp. 299–314, 1996. [23] M. B. Wilk and R. Gnanadesikan, “Probability plotting methods for the analysis of data,” Biometrika, vol. 55, pp. 1–17, 1968. PARK AND KULASEKERA: PARAMETRIC INFERENCE OF INCOMPLETE DATA 21 Chanseok Park is an Assistant Professor of Mathematical Sciences at Clemson K. B. Kulasekera is a Professor of Mathematical Sciences at Clemson Univer- University, Clemson, SC. He received his B.S. in Mechanical Engineering from sity. He received his B.S. in 1979 from the University of Sri Lanka; his M.A. Seoul National University; his M.A. in Mathematics from the University of in Statistics from the University of New Brunswick; and his Ph.D. in Statistics Texas at Austin; and his Ph.D. in Statistics in 2000 from the Pennsylvania State from the University of Nebraska, Lincoln, NE. His research interests include University. His research interests include survival analysis, competing risks, sta- survival analysis, nonparametric regression, and multivariate methods. tistical inference using quadratic inference function, robust inference, and sta- tistical computing and simulation.

DOCUMENT INFO

Shared By:

Tags:
Index Terms—Asymptotic variance, confidence interval, cumulative
exposure model, exponential distribution, Fisher information, maximum likelihood, optimum test plan, Censored data, competing risks, exponential distribution, maximum likelihood, missing cause.

Stats:

views: | 21 |

posted: | 9/23/2012 |

language: | |

pages: | 11 |

Description:
High quality scientific paper

OTHER DOCS BY MichaelABarron1234

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.