# PARAMETRIC MODEL SELECTION TECHNIQUES

Document Sample

```					PARAMETRIC MODEL SELECTION TECHNIQUES
GARY L. BECK, STEVE FROM, PH.D.

Abstract. Many parametric statistical models are available for modelling lifetime data. Given a data set of lifetimes, which may or may not be censored, which parametric model should be used to conduct statistical tests? In only a few cases can analytical expressions be found to answer this question in some optimal fashion. Various measures of discrepancy and other functionals of the distribution function will be considered for a ﬁnite number of competing parametric statistical models. Utilizing techniques developed by Linhart and Zucchini, survival data from pediatric patients who have received stem cell transplants will be analyzed to determine if models for random samples represent the actual model for the population.

1. Introduction Probability models are useful for providing information about observations of seemingly random variables. In controlled settings, various parametric models may be chosen which appear to ”ﬁt” the data. However, it is highly unlikely that investigators will have a complete data set with which to base assumptions and are therefore resigned to using a battery of tests to conﬁrm how well the chosen model ﬁts the observed pattern of data [1]. Linhart and Zucchini [2] lay the groundwork to ﬁnd more accurate means of model selection. Taking simple random samples, observations may be regarded as independent and identically distributed random variables with a non-negative probability density function (pdf). This particular pdf, notated f(x), may be regarded as the model which gave rise to the observations, which is referred to as the operating model [3]. In practice, the operating model is used to make estimations
The author would like to give special thanks to John Maloney, Ph.D., for his invaluable assistance in programming Maple.
1

2

GARY L. BECK, STEVE FROM, PH.D.

about f(x) even though it is based on only a sample of the population. It is only exceptional cases wherein suﬃcient information is available to identify the operating model. It is even more of a rarity to have a complete data set available to identify the operating model. Therefore, it is important to have an understanding of the information under investigation in order to circumscribe a family of models to best represent the pdf. The size of the family of models is determined by the number of its independent parameters. These parameters are estimated from observations, the accuracy of which depends on the amount of data available relative to the number of parameters to be estimated. This improves as either the sample size increases or the number of parameters decreases. Attempting to estimate too many parameters is referred to as overﬁtting, which leads to instability. In other words, repeated samples collected under comparable conditions leading to widely varying models exempliﬁes overﬁtting. Consequently, this suggests that successful model ﬁtting is a matter of ensuring a suﬃcient data set to achieve a determined level of accuracy. Traditionally, histograms have been used to summarize a set of data graphically. Based on the graphical display, the behavior of the histogram may be used to estimate the probability density function whether it be a ﬁnal estimate or intermediate estimate in search of a smoother model. However, the properties of the histogram as an estimator of the pdf strongly depend on the sample size and number of intervals of the information. Depending on the sample size, the number of intervals tend to display greater variability; however, the uncertainty of this phenomenon is identifying whether these graphical variations represent the population as a whole or simply sample characteristics. A common approach to ﬁtting a model is to select the simplest approximating family which is consistent with the data. This is a matter of selecting from a catalog of models which match the general features apparent in the data, which may be recognized using histograms. It is assumed

PARAMETRIC MODEL SELECTION TECHNIQUES

3

that the family of models represents the true situation, which must then be tested by means of a hypothesis to determine if the model is consistent with speciﬁc aspects of the data. The advantage to this method is that relatively simple models may be selected to analyze the data. In so doing, assumptions are made that the models chosen are valid, and thus estimators are chosen and decisions made. The drawback is that the assumptions about unknown parameters becomes the focus, which in reality are only of interest if they can be naturally and usefully interpreted in the context of the observation. Instead, Linhart and Zucchini suggest using estimators robust to deviations from the assumed family selected for ﬁtting holds. [2]. This approach is to choose a family of models which are estimated to be the most appropriate under the circumstances, which they identify as the background assumptions, the sample size, and the speciﬁc requirements of the user. Discrepancies Before comparing performances of competing models may occur, what measure to use to assess the ﬁt or lack thereof must be determined. This measure of lack of ﬁt is referred to as a discrepancy, denoted ∆(f, gθ ). A discrepancy between the operating model and the best approximating model is referred to as the discrepancy due to approximation. It constitutes the lower bound for the discrepancy for models in the approximating family and does not depend on the data, the sample size, or the method of estimation employed. The discrepancy between the ﬁtted model and the best approximating model is called the discrepancy due to estimation, which does depend on the sample values and changes depending on the sample. Therefore, discrepancy due to estimation is a random variable. Finally, the overall discrepancy is deﬁned as the discrepancy between the operating model and ﬁtted model, which takes both factors into account. Therefore, it is necessary to take the two issues into account when comparing approximating families of diﬀerent complexity. The best model in more complex families is typically closer to the

4

GARY L. BECK, STEVE FROM, PH.D.

operating model than the best model in simpler families. However, the ﬁtted model in the complex family tends to be further away from the best model than is the simpler model. Thus, complex families have more potential but tend to perform below potential. Therefore, the overall discrepancy, which is a sum of its two component discrepancies, allows for an appropriate compromise between the two opposing properties [3]. All of this is possible when the actual operating model is known. However, in practice, this is rarely the case. In practice, calculating discrepancies is not possible even though they exist. Since the actual operating model is unknown and overall discrepancy is unknown, an estimator of the expected discrepancy, E∆(f, gθ ), can be made, which is called a criterion. The expected ˆ discrepancy given by Linhart and Zucchini is notated as (1.1) where g ˆ (x) =
(I) θ ni I 100n

ˆ EF ∆(θ) = EF for
100(i−1) I

100 0

(f (x) − g ˆ (x))2 dx

(I) θ

< x ≤

100i I ,

i = 1, 2, ..., I. In this equation, ni represents the

frequency of the ith interval. In taking the expectation of this integral after suﬃcient subdividing, the following equation was obtained (1.2)
100i I 100(i−1) I

ˆ EF ∆(θ) = EF

100 0

I (f (x) dx + 1 − (n + 1) 100n
2

I 2 πi i=1

where πi =

f (x)dx. The ﬁrst term does not depend on the approximating family and

therefore can be ignored. The second term is the essential term and an unbiased estimator of this term, a criterion, is given by (1.3) I n+1 1− 100n n−1
I i=1

n2 i −1 n

This type of procedure does not specify any one approximating family of models except on the basis of their criteria. Some situations merit using a simple or a complex model depending on the behavior of the data set. For some instances, it may also be possible to construct a test of a

PARAMETRIC MODEL SELECTION TECHNIQUES

5

hypothesis that a particular approximating family has a smaller expected discrepancy than another. In most cases, tests are diﬃcult to construct, meriting reliance on simple comparisons of estimated expected discrepancies against a selected level of signiﬁcance. Consider the following random age distributions from the Federal Republic of Germany in 1974 (Statistisches Bundesant,1976, p. 58). Figures 1 to 2 display the distribution of ages in histogram format. The ﬁgures show the grouped data in individual intervals, intervals of 10 and intervals of 50. The size of the interval will determine the optimal criterion. Figure 3 shows the criterion values for the population, indicating at I = 6 is approximately the smallest criterion value, thereby identifying it as the most appropriate. Table 1. Random Age Distributions, n=100 77 17 55 39 63 1 4 82 41 68 26 7 27 24 10 3 67

40 34 11 14 13 4

39 68 20 68 15

50 66

34 48 46 62 32 61 46 11 35 43 48 50 34 4 36 40 52 19 75 8 9 3 7 63

10 23 70 12 56 34 79 74 54 67 65 64 66 67 28 42 59 3 48 11 2 7

39 15 18 10 5 13 6 7 32 26

63 16 79 44 13 37 37 39 20 15 8

24 17 26

This leads to the primary issue of model selection; namely, identifying an operating family of models and constructing a discrepancy. Identifying the operating model is typically determined by the type of analysis one intends to complete, whether it is hypothesis testing, regression analysis,

6

GARY L. BECK, STEVE FROM, PH.D.

20

15

10

5

0 20 40 60 80 100

Figure 1. Age histogram, I=10

analysis of variance, etc. The next step is to choose a discrepancy. For this, what must be determined is the use to which the ﬁtted model is to be assigned and the determination of which aspect of it must conform to the operating model. Important Discrepancies Discrepancies should be selected to match the objectives of the analysis [3]. A natural estimator to use with a particular discrepancy is a minimum discrepancy estimator, or minimum distance estimator. In other words, this is simply the discrepancy between the approximating model and the proposed operating model. The method of maximum likelihood estimation (MLE) which was developed by R.A. Fisher states the desired probability distribution be that which makes the observed data most likely [1]. Using

PARAMETRIC MODEL SELECTION TECHNIQUES

7

6

5

4

3

2

1

0 20 40 60 80

Figure 2. Age histogram, I=50 MLE is an important general purpose method for calculating discrepancies. MLEs are asymptotically normally distributed, asymptotically ”minimum variance”, and asymptotically unbiased (as n approaches inﬁnity) [1]. The Kullback-Leibler discrepancy, notated

∆K−L (f, gθ ) = −EF log(gθ (x)) = −

log(gθ (x)f (x)dx

where gθ is the pdf characterizing the approximating family of models. The minimum discrepancy estimator associated with this discrepancy is also MLE. This discrepancy focuses on the expected log-likelihood when the approximating model is gθ , and, as a result, the higher the expected loglikelihood the better the model. Another possible discrepancy is the Cram´r-von Mises discrepancy, which is notated as e ∆C−M (θ) = EGθ (F (x) − Gθ (x))2

8

GARY L. BECK, STEVE FROM, PH.D.

0 0

10

20

30

40

50

-0.002

-0.004

-0.006

-0.008

-0.01

-0.012

Figure 3. Graph of Criterion of Ages For discrete or grouped data sets, the Pearson chi-squared or Neyman chi-squared discrepancies are suitable. They are notated, respectively, as (f (x) − gθ (x))2 , gθ (x) = 0 gθ (x)

∆P (θ) =
x

and ∆N (θ) =
x

(f (x) − gθ (x))2 , f (x) = 0 f (x)

where f and gθ are the pdf’s characterizing the operating model and approximating family. Discrepancies need not depend on every detail of the distribution. Rather, discrepancies may be based on some speciﬁc aspect of the distributions. For example, in regression analysis, only certain expectations are of particular interest. Thus, the use of this method for model selection, albeit complex, is very ﬂexible for any aspect of data analysis. Derivation of Criteria

PARAMETRIC MODEL SELECTION TECHNIQUES

9

Each operating model and discrepancy require their own method of analysis. Having decided which approximating families to be considered, the methods to be used to estimate the parameters, and which discrepancy to use to assess the ﬁt, a criterion must be found, which is an estimator of ˆ the expected discrepancy EF ∆(θ). This expectation is taken with respect to the operating model F. The derivation of this can be straightforward or exceptionally complex (NOTE: The appendix of Linhart and Zucchini [2] details the derivation of these criterion). When the expected discrepancy is too complex to derive, asymptotic methods will sometimes ˆ work; i.e., its limiting values as the sample increases indeﬁnitely. By approximating ∆(θ) by the ﬁrst terms of its Taylor expansion about the point θ0 , the expectation is then calculated. Thus, as the sample size increases, the expected discrepancy approaches the form

(1.4)

K ˆ EF ∆(θ) ≈ ∆(θ0 ) + 2n

where K = trΩ−1 Σ, a trace term of the product of two matrices. Bootstrap methods provide a simple and eﬀective means of circumventing the technical problems encountered in deriving expected discrepancies and estimators. With this method, one generates repeated samples of size n using a ﬁxed Fn which was derived from the operating model and ˆ empirical distribution function. Each sample leads to more estimates of θ, which is an estimator for an approximating family of models. The average of this converges to the expected discrepancy. Cross-validation is a technique to assess the ”validity” of a statistical analysis. With this method, data are subdivided into a calibration sample (sample size of n-m) and the second sample to validate it (sample size of m). This procedure of ﬁtting and validating is repeated m times, one for each subdivision. There is a problem in deciding how to select m without limiting the number of observations available to ﬁt the model. With cross-validation, one may use a small m and follows the these steps for all possible model ﬁtting samples of size n-m: ﬁt the model to the calibration

10

GARY L. BECK, STEVE FROM, PH.D.

sample then estimate the expected discrepancy for the ﬁtted model using the validation sample. The cross-validation criterion is therefore the average over these repetitions of the estimates obtained in the second step. As shown in Figures 1 and 2, histogram densities are universally applicable. However, lower discrepancies may be achieved by ﬁtting smoother approximating densities that depend on fewer parameters. Histograms are primarily used as a means of selecting approximating models and, therefore, smoother models such as the normal, lognormal, and gamma provide more concise descriptions. Unless certain distributional properties of the estimator are available, it is not possible to derive exact expressions for the expected discrepancy. Since ﬁnite sample distributions are too diﬃcult to derive, one must rely on asymptotic methods, Bootstrap methods, or cross-validatory methods. This study concentrated on asymptotic methods of a complete data set acquired from the University of Nebraska Medical Center. In order to obtain asymptotic criteria, it is necessary to obtain a trace term, trΩ−1 Σ, derived from a functional ∆ on an M xM k -dimensional stochastic process. This may be estimated from
−1 ˆ data using an estimator for Ω and Σ. The criterion is then ∆n (θ) + trΩn Σn , wherein Ω and Σ have n

been derived. Alternatively, if the operating model were a member of the approximating family, the trace term becomes signiﬁcantly simpler. For a number of discrepancies, the trace term is simply a multiple of p, the number of free parameters in the approximating family. It should be noted that approximations on which the derivation of simpler criteria is used will be inaccurate whenever the discrepancy due to approximation is large [2]. The Kullback-Leibler discrepancy is one of the most important general purpose discrepancies. It is also an essential part of the expected log-likelihood ratio and is related to entropy, a fundamental property in information theory. This discrepancy and its asymptotic criteria gives rise to a number

PARAMETRIC MODEL SELECTION TECHNIQUES

11

of discrepancies of standard distributions. The discrepancy is notated as ∆(θ) = ∆(Gθ , F ) = −EF log(gθ (x)) with empirical discrepancy of ∆n (θ) = ∆(Gθ , Fn ) = − The asymptotic criterion is notated as trΩ−1 Σn n ˆ ∆n (θ) + n where the simpler criterion is notated to be p ˆ ∆n (θ) + n The minimum discrepancy estimator, which is the maximum likelihood estimator, is notated ˆ θ = argmin{∆n (θ) : θ ∈ Θ} Given this information, Linhart and Zucchini have identiﬁed maximum likelihood estimators for a variety of probability distributions. For the purposes of this study, the normal, lognormal and gamma distributions were used. Based on the methodologies of Linhart and Zucchini (Appendix, [2]), the Kullback-Leibler criterion for each model is indicated below. Please note that sample moments and sample moments about sample means are denoted mh [·] and mh [·], h=1,2,.... For example, m2 [log(x)] =
1 n n 2 i=1 log (xi )

1 n

nlog(gθ (xi ))
i−1

is the second moment about the mean of log(x).

Therefore, for normal distributions, the criterion is (1.5) 1 + log(2πm2 ) m4 + m2 2 + 2 2nm2 2

12

GARY L. BECK, STEVE FROM, PH.D. m4 +m2 2 . 2nm2 2

with trace term equalling

The estimators for the normal distribution are ˆ λ = m2

µ = m1 ˆ ˆ It should be noted that λ is the same as σ 2 for this distribution and the lognormal distribution. ˆ For the lognormal distribution, the criterion is 1 + log(2π) + log(m2 [log(x)]) m4 [log(x)] + m2 [log(x)] 2 + 2 2nm2 [log(x)] 2
m4 [log(x)]+m2 [log(x)] 2 . 2nm2 [log(x)] 2

(1.6)

m1 [log(x)] +

with trace term equalling are

The maximum likelihood estimators for lognormal

µ = m1 [log(x)] ˆ ˆ λ = m2 [log(x)] Finally, the criterion for the gamma distribution is notated trΩ−1 Σn n ˆ log(Γ(ˆ)) − ν (log(λ − 1) − (ˆ − 1)m1 [log(x)] + ν ˆ ν n

(1.7)

Trace term for the gamma distribution is the last term in equation 1.7 where   Ωn =  
m12 ν ˆ m − ν1 ˆ

   

−

m1 ν ˆ

ψ (ˆ) ν

  Σn =   m2

 −m11 [x, log(x)]    −m11 [x, log(x)] m2 [log(x)])
ν ˆ ˆ λ

The estimators for the gamma distribution are

ˆ = m1 and log(λ) − ψ(ˆ) = −m1 [log(x)]. ν

PARAMETRIC MODEL SELECTION TECHNIQUES

13

2. Methods Data Collection Data for this study was obtained from the University of Nebraska Medical Center Clinical Cancer Trials oﬃce. Pediatric patients who have undergone stem cell transplants are tracked by this oﬃce. Information pertaining to the date of diagnosis, date of treatment, and date of expiration are collected and stored in a Microsoft Access database. The Institutional Review Board gave an exemption to this study because no subjects would be identiﬁable. Analysis The entire population of pediatric patients who have received stem cell transplants at the University of Nebraska Medical Center is 289. Therefore, it is possible to compare the operating model based on the entire population with repeated random samples of 20. Use of equation 1.3, criterion for the data set was examined. Figures 4 and 5 show histograms of survival data based on intervals when I = 10 and I = 50. It was found that at I = 6, the criterion was the smallest and thus the optimal number of intervals for this histogram. Using the Kullback-Leibler criterion of normal, lognormal, and gamma distributions, the study will emphasize how well one of the chosen models ”selects” the true operating model as determined by the mean square error. Using the known population, this technique should select the ”correct” best model from the approximating models, where ”correct” best is measured by the fraction of samples of size 20 when certain functionals of the cumulative distribution function of the population are most closely estimated. Let Y0 be a real positive number. Let M = P (Y ≤ Y0 ), where Y is a population value. For a given sample of n data points (n=20 ), each competing model (normal, lognormal, gamma) will ˆ ˆ ˆ provide an estimate of M. Let these be denoted by MN (normal), ML (lognormal), and MG (gamma) ˆ ˆ for each model. Now let MN (i) equal the value of MN for sample number i. This was computed

14

GARY L. BECK, STEVE FROM, PH.D.

100

80

60

40

20

0 50 100 150 200 250

Figure 4. Histogram of Survival Data, I=10 ˆ from the MLEs of the model parameter (a parametric model). Similar deﬁnitions for ML (i) and ˆ MG (i) exist. Let Rn equal the number of samples generated at random of size n=20. Computing the absolute errors, the following results are ˆ EN (i) = |M − MN (i)|, i = 1, 2, . . . , Rn ˆ EL (i) = |M − ML (i)|, i = 1, 2, . . . , Rn ˆ EG (i) = |M − MG (i)|, i = 1, 2, . . . , Rn By generating 5,000 random samples of size n, approximate mean square errors (MSEs) are computed. MSE equals the mean of the squares of the deviations from the target [4]; i.e., M SEN = M SEL =
5000 i=1 (M

ˆ − MN (i))2 = 5000 ˆ − ML (i))2 = 5000

5000 2 i=1 EN (i)

5000
5000 2 i=1 EL (i)

5000 i=1 (M

5000

PARAMETRIC MODEL SELECTION TECHNIQUES

15

20

15

10

5

0 50 100 150 200

Figure 5. Histogram of Survival Data, I=50

M SEG =

5000 i=1 (M

ˆ − MG (i))2 = 5000

5000 2 i=1 EG (i)

5000

The actual ”best” model will be the one with the smallest M SE(·) value among the normal, lognormal, and gamma models. Upon completion of these computations, the Kullback-Leibler criterion was calculated for each model to ascertain whether this method matched the determination based on the actual operating model. For this, the asymptotic criteria values for the Kullback-Leibler criterion for these models, notated as AN (i), AL (i), and AG (i), using equations 1.5, 1.6, and 1.7. The chosen parametric model should correspond with the smallest of these three values. For each value of size n=20, the asymptotic criteria was computed to determine a concluded ”best” model. It should be noted this method of comparing asymptotic criteria with MSEs is valid for this study only because the complete population is known.

16

GARY L. BECK, STEVE FROM, PH.D.

3. Discussion The entire population of pediatric stem cell recipients were included in this data set, n=289. A co-author (SF) wrote a program in Fortran to generate random samples of 20 from the data set which calculated mean square errors and asymptotic criterion. To ensure accuracy of this methodology, 5,000 random samples were generated for each MSE and asymptotic criterion. All of these results could not be presented in this report. Consequently, Table 2 includes a representative sample of n=20 from the data set. Table 2. Random Sample of Survival Data of Stem Cell Recipients, n=20 14.57237 34.14474 50.95395 63.25658 49.83553 121.61184 20.32895 44.11184 39.14474

28.94737 119.17763 87.10526 9.86842 57.79605 25.88816 23.28947

102.43421 51.25000 194.40789 83.65132

From this random sample, maximum likelihood estimators for normal, lognormal, and gamma distributions were calculated. Based on these results, the Kullback-Leibler criterion, estimated standard deviation and trace values were calculated. These results are shown in Table 3. Table 3. Random Sample Criterion Results µ ˆ Normal Lognormal Gamma 61.09 3.854 ˆ λ 1984.70 0.5493 K-L Criterion Est. Std. Dev. 5.356 5.061 5.073 0.0994 0.0617 0.0635 Trace 0.1406 0.0873 0.0898

2.0896 0.03421

From these results, it is evident that the lognormal distribution has the smallest asymptotic criterion followed closely by the gamma distribution. Regardless, according to Kullback-Leibler, this would

PARAMETRIC MODEL SELECTION TECHNIQUES

17

then be the ”best” model to use to analyze the population. The interesting note is that for a sample of this size in much of the medical literature, a normal distribution would be assumed due to the n size. However, by this method, the normal distribution resulted in the largest criterion and would therefore be a less than optimal choice for this population. The real data was used to obtain the actual value of M = P (Y ≤ Y0 ), where Y0 is a given positive real number and Y is a population value. Using the following inequality #populationvalues ≤ Y0 N

M=

where N=289 is the population size. Based on the smallest MSE from 5,000 random samples of size n=20, the ”best” parametric model was selected. In calculating the mean standard error, Y0 = 50.0, M = P (Y ≤ Y0 ) = 0.5156. The MSEs for each distribution were as follows: N ormal = 0.0236

Lognormal = 0.0098

Gamma = 0.0085 Determining the MSE for the entire population, the gamma distribution appeared to be the optimal distribution to use for analyzing the population. However, given the close values for lognormal and gamma MSEs, the gamma distribution is only slightly better. In both cases of asymptotic criterion and MSE, the normal distribution is clearly not an appropriate means of analysis. When all 5,000 random samples were analyzed, the model that was deemed to be the most appropriate should have the smallest value the majority of the time. For this simulation, 76.4% of the samples of n=20 found that the asymptotic criterion correctly picked the best model for Y0 = 50.0. Of interest, the middle third of the distributions, the asymptotic criterion correctly

18

GARY L. BECK, STEVE FROM, PH.D.

picked the best model. Where Y0 is small or large, the asymptotic criterion does not seem to be the most optimal selection method.

70 60

50 Percent 40 30

20 10

0

50

100

150

200 Y_0

250

300

Figure 6. Y0 versus Percentage (Mean=65.4667; Variance=2787.8976, Median=47.7960)

4. Conclusion The Kullback-Leibler criterion seems to work well for the middle of this population distribution (See Figure 6). It seems the normal distribution had a better ﬁt for the left tail when Y0 was small and the lognormal distribution is a better ﬁt for the right tail for large Y0 . This implies another criterion which emphasizes a better ﬁt for the tails is needed. The small population size and the fact that MSEs were extremely close for the gamma and lognormal distributions where the above percentages were small are also contributing factors to this phenomenon. Limitations:

PARAMETRIC MODEL SELECTION TECHNIQUES

19

The population size for this study was relatively small, n=289. To better test the KullbackLeibler criterion, acquiring a population of much larger size would be ideal. This would allow for a greater range of random samples from which to draw. A serious limitation for this study was also the author’s (GLB) inability to write Fortran code. This type of intense random sampling and calculation necessitates the ability to generate unique code not found in standard software packages. It should also be noted that even acquiring the assistance of an expert programmer for Maple software still proved to be inadequate for generating the necessary results. For example, criterion calculated using equation 1.3 tended toward I = 10 for both the example data and real data sets. Additional manual computations were done for the samples in Table 1 to ensure Maple’s output was accurate. Future Considerations: Given the tail eﬀects of the random sampling, it is evident a single parametric model is insuﬃcient to measure this population as a whole. Rather than take a traditional approach of partitioning the data for analysis with the three best ﬁtting models, considering a hybrid probability distribution may prove to be be a better model for this analysis. This would take into consideration the smaller and larger samples so that a single distribution could be used to model the population. It may also be possible to obtain an even larger data set from the University of Nebraska Medical Center. With a larger sample size, some of the phenomenon with the tail eﬀects may be diminished. It would also allow for a more accurate assessment of the comparisons between MSEs and KullbackLeibler criterion.

References
[1] Myung IJ. Maximum Likelihood Estimation. http://quantrm2.psy.ohio-state.edu/injae/mle-pub.pdf (Submitted for publication 11/21/01)

20

GARY L. BECK, STEVE FROM, PH.D.

[2] Linhart H, Zucchini W. Wiley Series In Probability And Mathematical Statistics: Model Selection. John Wiley and Sons, New York, 1986. [3] Zucchini W. An Introduction to Model Selection. J Math Psych 2000, 44:41-61. [4] Battaglia GJ. Mean Square Error. AMP J Tech 1996; 6:31-36.

Department of Pediatrics, University of Nebraska Medical Center, Omaha, NE, Department of Mathematics, University of Nebraska at Omaha, Omaha, NE

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 62 posted: 12/28/2009 language: English pages: 20
How are you planning on using Docstoc?