IRES Working Paper Series
Model Stability and the Subprime
Vincent W. Yao
September 12, 2010
Model Stability and the Subprime Mortgage Crisis
Xudong An†, Yongheng Deng§, Eric Rosenblatt‡, Vincent W. Yao‡
September 12, 2010
We study the potential model instability problem with respect to mortgage default risk and
examine to what extent it helps explain the default shock during the recent crisis. We find that
econometric default risk models based on historical data can be unstable over time. Due to
temporal shifts in the parameters, default prediction of the 2006 vintage subprime loans based on
hazard and Logit models estimated with 2003 vintage loan data can generate over 40 percent
fewer defaults than the actual number, assuming perfect forecast of house price change. We also
find that the combined impact of parameter instability and bad forecast of HPI growth enlarges
the under-prediction of default rate but the marginal impact of parameter instability is larger than
that of bad HPI forecast. Our findings have important implications regarding model limitations
and risk, model improvements, economic capital, and regulatory reform.
Keywords: subprime mortgage, default risk, model stability, hazard model, Logit model
The authors are grateful to John Clapp, David Geltner, Richard Green, Michael Lea, David Ling, Tony
Sanders, and Brent Smith for helpful discussions and suggestions. We also thank participants in the
Maastricht-MIT-NUS 2009 Real Estate Finance and Investment Symposium, the 2010 Weimer School of
Advanced Studies in Real Estate and Land Economics, the Finance Seminar at San Diego State University for
Department of Finance, College of Business Administration, San Diego State University, 5500 Campanile
Dr., San Diego, CA 92182-8236; firstname.lastname@example.org, (619) 594-3027, (619) 594-3272(fax).
Institute of Real Estate Studies, National University of Singapore; 21 Heng Mui Keng Terrace, #04-02,
Singapore, 119613; email@example.com, (65) 6516-8291, (65) 6774-1003 (fax).
Fannie Mae, 3900 Wisconsin Avenue, Washington, DC 20016. E-mails: firstname.lastname@example.org and
Model Stability and the Subprime Mortgage Crisis
The recent financial crisis was originally triggered by the large scale of unexpected losses on
mortgages and mortgage-related securities.1 The default shock has called upon an investigation
of what went wrong with the credit risk models. Some see the problem coming from the data
input. For example, Satyajit Das, a former Citigroup banker, told Bloomberg reporters that:
“The models are fine. But they have an input problem. It becomes a number we pluck out
of the air. They could be wrong, and the ratings could be misleading.”2
Others, however, blame model instability. For example, Alan Greenspan, 2008, suggested:
“The whole intellectual edifice, however, collapsed in the summer of last year because
the data inputted into the risk management models generally covered only the past two
decades — a period of euphoria.”3
In this paper, we investigate the potential model instability problem with respect to mortgage
default risk and examine to what extent the model instability explains the default shock in the
Our study is with regard to two econometric models that are well-established in the academic
literature and that have been widely adopted by the mortgage industry, the Logit model and the
Cox proportional hazard model. Although mortgage lenders and investors usually keep their
Major mortgage investors had to substantially write down their mortgage assets and rating agencies had to
adjust their ratings to reflect revised expectations of default losses during the crisis. Many mortgage lenders
went into bankruptcy due to unexpected losses.
“CDO Boom Masks Subprime Losses, Abetted by S&P, Moody's, Fitch,” Bloomberg News, May 31, 2007.
“The Financial Crisis and the Role of Federal Regulators,” the House Committee on Oversight and
Government Reform hearing on October 23, 2008.
specifications of those models proprietary and thus we cannot evaluate models of a particular
lender or investor, we hope to form insights about the general features of those econometric
models through this study.
We find that both the conventional Logit model and the hazard model with reasonable
specifications show inter-temporal instability. For example, based on Wald tests, subprime
mortgage loans originated in 2006 have significantly different default sensitivity to house price
appreciation (depreciation) than those originated in 2003. To assess how the model instability
explains the default shock during the subprime mortgage crisis, we use parameters estimated
with the 2003 vintage data to forecast default probabilities of the 2006 vintage loans and study
the aggregate prediction accuracy. Using the actual realization of the default risk factors such as
house price appreciation (HPI growth) and leverage of the 2006 vintage, we find that the hazard
model estimated with the 2003 vintage data predicts about 40% fewer defaults than the actual
results of the 2006 vintage while the Logit model predicts about 41% fewer defaults.
Alternatively, we take imperfect HPI growth forecast into consideration. If one is to assume the
same HPI growth during the periods of 2006-2009 and 2003-2006, the under-prediction will be
more severe. The predicted default rate is less than 50 percent of the actual results. Stated
differently, the actual default rate of the 2006 vintage loans during the 2006-2009 period is more
than twice as high as that is predicted based on the 2003 vintage models. Apparently, adding bad
forecast of other variables such as interest rate and unemployment into consideration, the ex post
default rate could be several times higher than that is predicted. This finding coincides with the
relation between expected and actual losses of the 2006 vintage loans: the 2006 vintage were
originated with similar spreads with those of the 2003 vintage, reflecting mortgage lenders
expected similar losses from those two vintages; however, the ex post default rate of the 2006
vintage is over 3 times higher than that of the 2003 vintage. Meanwhile, comparing the impacts
of parameter instability and bad forecast of HPI growth, we find that the marginal impact of bad
HPI forecast is smaller than that of a bad model.
The model instability problem we penetrate in this paper is similar to the so-called “Lucas
critique” regarding econometric policy evaluations (Lucas, 1976). The unprecedented crisis in
the subprime mortgage market and the widely believed structural break in the mortgage and
financial markets provide us a unique opportunity to study this issue in the field of risk
management. Our findings in this paper have a number of implications regarding model
limitations and risk, model improvements, economic capital and regulatory reform.
The rest of the paper proceeds as follows: in the next section, we review the state of the art
econometric models of mortgage default risk and discuss the specifications of the hazard model
and Logit model that we are focusing on in this paper; we report our data and explain our
sampling technique in section 3; in section 4, we discuss estimates of the two econometric
models and parameter stability tests; we explain our prediction experiments and assess how
model instability explains the default shock in section 5; and we provide concluding remarks in a
2. Mortgage Default Risk Models
2.1. Econometric models of mortgage default risk
The past forty years have seen a growing literature on mortgage default risk based on ex post
loan performance4, which helps lenders and investors to understand determinants of mortgage
There is also a literature that tries to understand the implied (ex ante) default risk through mortgage prices
(see, Kau, Kenan and Kim 1994, Capozza Kazarian and Thomson 1998 and many others).
default risk and lays out the foundation for default risk prediction, pricing and management.
Econometric models for mortgage default risk have evolved from simple linear regressions to the
more sophisticated Logit and hazard models.
Linear regression von Furstenberg (1969, 1970a, 1970b) develop the first academic default risk
model, a linear regression based on aggregate data. The author regress the aggregate default rate
of FHA/VA loans on loan characteristics such as loan-to-value (LTV) ratio and age of the loan,
and find that home equity at loan origination is the most important predictor of default.
Subsequently, a number of studies apply that technique to different loan samples (e.g. loans
originated by S&Ls) and alternative mortgage instruments (ARMs and GPMs).5 Later studies
also experiment with more explanatory variables such as borrower income, payment-to-income
ratio, metro unemployment rate, and mark-to-market LTV. Those studies provide important
guidance for lenders’ underwriting practice and lead to major revisions in underwriting criteria.6
Linear regressions with aggregate data are still used in recent studies of subprime mortgage
default (see, e.g. Mian and Sufi 2009).
Probit and Logit models The literature on mortgage default proliferates as disaggregate loan
level data become available. While some early studies such as Herzog and Earley (1970) still
apply linear regressions to disaggregate loan level data (with 0/1 dependent variable)7, many
more studies apply probit and Logit models based on economic theory of discrete choice.
Jackson and Kasserman (1980) is the first to use probit model based on individual FHA loans.
Campbell and Dietrich (1983) and many others use Logit models. The use of probit/Logit model
See, for example, von Furstenberg and Green (1974), Follain and Struyk (1977), Vandell (1978), Jackson and
Kasserman (1980), Foster and Van Order (1984, 1985), Clauretie (1987), and Quigley and Van Order (1991).
For example, lenders made important changes in response to pressure on revealed redlining practice and
Fannie Mae revised standards on ARMs based on academic studies (Vandell, 1993).
Other examples include Williams, Beranek and Kenkel (1974) and Webb (1982).
allows researchers to explore more risk factors at the loan level such as loan purpose, loan term,
etc. Meanwhile, contemporaneous equity position replaces original equity position in newer
studies (e.g. Zorn and Lea 1989, Cunningham and Capone 1990). Transaction costs of default,
trigger events and related borrower characteristics appear more frequently in the models (see, e.g.
Vandell and Thibodeau 1985, Hendershott and Schultz 1993, Archer, Ling and McGill 1996,
1997, Capozza, Kazarian and Thomson 1997). As option pricing theory being applied to
mortgage valuations, more and more studies conduct tests on whether default is significantly
related to put option “in the money” (see, e.g. Quigley and Van Order 1995, Archer, Ling and
McGill 1996, Philips, Rosenblatt and Vanderhoff 1996). Recent applications of Logit models
create event-history for each loan and thus are better suited to study the impacts of time varying
variables (see, e.g. Clapp et al 2001, Ambrose and Sanders 2003, Clapp, Deng and An 2006, An,
Clapp and Deng 2010). For subprime mortgage default, Rajan, Seru and Vig (2010) and Keys,
Mukherjee, Seru and Vig (2010) have applied Logit models in their studies.
Hazard model Developed in the biostatistics literature and first used for mortgage prepayment
risk studies (e.g. Green and Shoven 1986, and Quigley 1987), the proportional hazard model has
prevailed for mortgage default risk studies in the past two decades (see Van Order 1990, Quigley
and Van Order 1991, Schwartz and Torous 1993, Deng, Quigley and Van Order 1996 and many
others). In contrast with Logit model with event-history data, which assumes the borrower’s
choices in each month are i.i.d. events, hazard model is based on conditional default probability
and implicitly handles path-dependency. It is thus more appealing for the modeling of borrower
behavior that is usually path-dependent. Moreover, Logit model is restricted by the assumption
of no correlations among competing risks via unobservable variables. By contrast, hazard model
has the flexibility to allow correlated competing risks (Clapp, Deng and An 2006). Recently, a
number of papers have applied the Cox proportional hazard model to study subprime mortgage
default (e.g. Demyanyk and Hemert 2008, Gerardi, Shapiro and Willen 2008, Elul 2009,
Haughwout, Okah and Tracy 2009, Green, Rosenblatt and Yao 2010, An, Yao and Rosenblatt
2010)8. However, proportional hazard model is not as convenient as the multinomial Logit model
in dealing with the competing risks of mortgage default and prepayment. To address that issue,
Deng, Quigley and Van Order (2000) apply the competing risks hazard model to mortgage
prepayment and default, and a number of following studies have adopted that methodology.
Deng, Quigley and Van Order (2000) and Deng and Quigley (2002) also model the unobserved
heterogeneity with a mass-point mixed competing risks hazard model. Alexander, Grimshaw,
McQueen and Slade (2002) and Pennington-Cross (2003) apply that technique to subprime
mortgage default. Clapp, Deng and An (2006) extend the unobserved heterogeneity concept to
multinomial Logit model. Covariates included in a hazard model are usually similar to those in a
Although other models such as discriminant analysis, neural networks analysis, and classification
trees analysis appear in the literature9, Logit model and hazard model have been the dominant
econometric models used in the academic literature for mortgage default risk. They have also
become the standard tool of mortgage default risk analysis in the mortgage industry.
A Cox proportional hazard model assumes that the hazard rate of default of a mortgage loan at
period T since its origination follows the following form:
hi T ; X i t h0 T exp X i t ' , i 1, n . (1)
Ciochetti et al (2003), Chen and Deng (2003), and An, Deng and Sanders (2009) apply the model to CMBS
See Morton (1975), Episcopos, Pericli and Hu (1998), and Feldman and Gross (2005), respectively.
Here h0 T is the baseline hazard function, which only depends on the age (duration) of the loan,
T 10; X i t is a vector of proportional covariates for individual loan i that are time-varying or
time-invariant risk factors.
In a Logit model, the default probability of a loan at age T is:
exp Z i t , T '
Pri T ; Z i t . (2)
1 exp Z i t , T '
Here the dependence of default probability on loan age (duration) is modeled by including loan
duration dummy variables in the covariates set Zi t , T .
2.2. Model specification
Our model specification generally follows the existing literature. We include the following
covariates in our models:
Negative equity von Furstenberg (1970a, 1970b), Williams, Beranek and Kenkel (1974) and
many other find that home equity at loan origination is the most important predictor of mortgage
default. Later studies use contemporaneous LTV that takes house price change and loan
amortization into consideration, and find it to be a significant risk factor (see, e.g. Foster and Van
Order 1984, Vandell and Thibodeau 1985, Deng 1997). Recent studies on subprime default risk
also find such variable a significant risk factor (e.g. Alexander, Grimshaw, McQueen and Slade
2002)11. In this paper, we calculate the borrower’s negative equity as the difference between
contemporaneous house value and market value of the loan. Home price index (HPI) and loan
Notice that the loan duration time T is different from the natural time t, which allows identification of the
Alternatively, Demyanyk and Hemert 2008 and Elul 2009 use house price appreciation.
amortization are incorporated in our calculation. Additionally, we acknowledge multiple loans
(liens) on some properties and thus use the combined loan amount in negative equity calculation.
FICO score FICO score is a numerical summary of borrower’s history of debt repayment.
Although prime mortgage lenders screen out the low credit score borrowers, some researchers
still find the level of credit score matters to default risk among those loans originated (e.g. Clapp
et al 2001). For subprime mortgage loans, many recent studies have found it to be a significant
risk factor (see, e.g. Pennington-Cross 2003, Demyanyk and Hemert 2008, Elul 2009, Keys,
Mukherjee, and Seru and Vig 2010). Since many mortgage loans have cosigners (husbands or
wives), we use the minimum of the FICO scores of the two in this study.
Backend ratio The literature has long included payment-to-income ratio as a default risk factor
(e.g. Herzog and Earley 1970, Archer, Ling and McGill 1996, Deng and Gabriel 2006). The
payment-to-income ratio (frontend ratio) and debt-to-income ratio (backend ratio) are indeed two
important mortgage underwriting variables besides FICO score. But again for many subprime
mortgage loans, those two variables far exceed the traditional underwriting thresholds and recent
studies have found debt-to-income ratio to be significant default risk determinants (Demyanyk
and Hemert 2008, Green, Rosenblatt and Yao 2010).
Loan type Fixed rate mortgage (FRM) behave very differently from adjustable-rate mortgage
(ARM) (see, e.g. Cunningham and Capone 1990, Philips, Rosenblatt and Vanderhoff 1995,
Calhoun and Deng 2002). Some research has also found that 15-year FRMs are less risky than
30-year FRMs (e.g. Alexander, Grimshaw, McQueen and Slade 2002, Deng and Gabriel 2006).
In this study, we focus on FRM but include 15-year FRMs as explanatory variable.
Property type Recent research has found that condominium loans are less likely to default,
everything else equal (Agarwal, Ambrose and Sanders 2009). Therefore, we test whether
different property types, i.e. single unit, two-to-four unit and condominium, have different
Loan purpose Different loan purposes indicate borrowers being in different stage of their
housing tenure as well as in different financial situations. While Clapp et al (2001) find refinance
loans are more likely to default among prime mortgage loans, recent research such as Elul (2009)
find that subprime refinance loans are less likely to default, everything else equal. A related
variable considered in the literature is whether the property is an existing/new unit (see, e.g. von
Furstenberg and Green 1974, Campbell and Dietrich 1983, Deng and Gabriel 2006). We consider
three different loan purposes in this study: home purchase, rate/term refinance, and cash out
Documentation type An important feature of subprime mortgage loans is that many loans do not
have full documentation of income, asset or employment. The low documentation may be
caused by borrower’s difficulty in verifying their income, asset or employment, or in some
extreme situations borrowers simply state the income they don’t have (stated income loans).
Recent studies have found that low doc loans have higher default risk (e.g. Demyanyk and
Hemert 2008, Rajan, Seru and Vig 2009).
Occupancy type Demyanyk and Hemert (2008) and Agarwal, Ambrose and Sanders (2009) find
that investor properties are more likely to default. In this paper, we consider the following three
types of occupancy types: owner-occupied, second/vacation home, investor property.
Mortgage brokerage type Many popular media has ascribed the subprime crisis to mortgage
brokers. Green, Rosenblatt and Yao (2010) have found that broker and correspondent loans have
higher default risk than retail loans. We therefore include brokerage type in our models.
Origination loan balance Size of the loan is thought to be related to the transaction cost of
default (e.g. Clapp et al 2001, Deng and Gabriel 2006) and recent studies of subprime mortgage
default risk have found it to be significantly related to default (see, e.g. Demyanyk and Hemert
2008, Elul 2009).
Origination LTV Some researchers believe that LTV at origination (or down payment) does not
only affects the equity position of the borrower throughout the life of the loan, but also reveals
borrower’s default propensity, or indicates the borrower’s ability to save, or affects borrower’s
default decision as sunk costs (see, Yezer, Phillips and Trost 1994, Deng, Quigley and Van
Order 1996, 2000, Kelly 2009, Green, Rosenblatt and Yao 2010). Additionally, lenders may pay
different levels of due diligence on high LTV and low LTV loans. Therefore, in addition to
considering combined LTV in negative equity calculation, we include origination LTV.
Prepayment penalty Most prime residential mortgage loans are free to prepay. By contrast,
many subprime loans have prepayment penalty clause in the mortgage contracts. Researchers
believe prepayment penalty limit the subprime borrower’s ability to refinance into more
affordable loans and thus increase the chance of default (e.g. Demyanyk and Hemert 2008, Elul
2009, Agarwal, Ambrose and Sanders 2009).
Unemployment rate One possible reason of mortgage default is borrower’s loss of job and thus
not being able to make the mortgage payment. Therefore, the mortgage default risk literature has
long included local area unemployment as a risk factor (see, e.g. Williams, Beranek and Kenkel
1974, Campbell and Dietrich 1983, Deng, Quigley and Van Order 2000). Recent research has
also found it to be a significant risk factor for subprime loans (e.g. e.g. Demyanyk and Hemert
2008, Elul 2009).
Excess premium There is increasing evidence that mortgage lenders can possess private
information about borrower/loan quality that is not reflected in underwriting documents (see, e.g.
Elul 2009, Rajan, Seru and Vig 2009, An, Deng and Gabriel 2010, Keys, Mukherjee, Seru and
Vig 2010). Therefore, we include excess premium as a proxy of lender’s private information
about loan quality. The variable is constructed as the residual of a mortgage spread regression
that includes all observable default risk factors on the right hand side.12
Other variables We also consider some other variables such as jumbo loan status, growth of per
capita disposable income and growth of population in the metro area, corporate credit spread,
and HPI volatility. There are not included in the final model due to multicollinearity problem.
Ideally, we would include borrower characteristics such as age, gender, ethnicity and profession,
number of dependents, and neighborhood variables such as whether the property is in central city,
neighborhood homeownership rate, poverty level, crime rate, percent of homes foreclosed, etc.
However, we do not have data on those variables.
3.1 Data sources
Our data is mainly from First American CoreLogic LoanPerformance (hereafter LP). The LP
database contains loan-level data on over 80 percent of all securitized subprime mortgages,
which is also over half of all subprime mortgage loans originated in US.
The regression results are available upon request.
LP provides detailed information on each subprime mortgage loan, including note rate, original
loan balance, LTV, loan term (30 year, 15 year, etc.), loan type (fixed-rate, 5-1 ARM, etc.), loan
purpose (home purchase, rate/term refinance, cash out refinance), borrower credit score,
occupancy status, number of units, originator type (broker, retail lender, etc.), and prepayment
penalty type. LP also tracks the performance (default, prepayment, mature, or current) of each
loan in every month. Therefore, we construct the event-history of each loan, starting from its
origination to default, prepayment, mature, or our data collection point, whichever is the earliest.
We also merge other information such as HPI growth, interest rate, MSA-level unemployment
rate and income growth into our loan level data. Treasury rate and interest rate swap rate is
matched into the data to calculate the mortgage spread. HPI is from Fannie Mae and it is at the
zip code level. Treasury interest rate, corporate bond yields are from the Federal Reserve, and
MSA-level income growth and unemployment rate are from Moody’s Economy.com.
The LP database contains about 14 million subprime mortgage loans. For our study purposes, we
focus on first-lien, fixed-rate mortgage loans, which are about 19 percent of all loans13 . We
further apply a number of filters: we first exclude loans originated before 1995 since LP has
relatively less accurate information about those loans; seasoned loans are excluded since
information such as loan balance and LTV of those loans is not at loan origination; we also
exclude those loans with interest only periods or those not in metropolitan areas (MSAs); loans
with missing or wrong information on property type, refinance indicator, occupancy status,
backend ratio, FICO score, documentation level or mortgage note rate are excluded.
A large fraction of the subprime mortgage loans are ARMs, e.g. about 38 percent of the LP sample are 2/28
We further adopt a sampling technique for our purposes of study: we select a 10% random
sample of three vintage loans, those originated in 2000, in 2003 and 2006. The numbers of
subprime mortgage loans of those three vintages are 8,533, 31,836 and 26,876, respectively.
Then for each vintage, we look at a three-year window of loan performance after loan origination.
For example, for loans originated in 2000, we focus on its performance in 2000, 2001 and 2002.
In so doing, we have three non-overlapping samples.
3.3 Descriptive statistics
In table 1, we report the performance of the three vintage subprime mortgage loans. Default is
defined as over 90- day delinquency, and censor means that the loan is alive at the end of the
three-year window. Default rate varies across the three vintages, e.g. 2003 vintage has a
cumulative default rate of about 7 percent over the three-year window, in contrast to the 16
percent of the 2000 vintage and the 22 percent of the 2006 vintage. Apparently the strong house
price appreciation the 2003 vintage experienced during 2003-2005 helped most of the 2003
vintage loans stay current, while the 2001-2002 economic downturn and the sharp house price
decline starting from 2006 contributed to the high default rates of the 2000 and 2006 vintages.
Overall, default rates of all the three vintages of subprime mortgage loans are much higher than
that of prime mortgage loans as reported in previous studies (see, e.g., Philips, Rosenblatt and
VanderHoff 1995, Deng, Quigley and Van Order 2000, Clapp, Deng and An 2006).
Figure 1 compares the cumulative default rates of the three vintages over the life of the loan. In
every quarter after loan origination in the three-year window, the 2003 vintage has lower default
rate than the 2000 and 2006 vintages. Default rate of the 2006 vintage starts lower than that the
2000 vintage but soon surpassed that of the 2000 vintage one year after loan origination. Over 20
percent of loans originated in 2006 default within two years of origination.
Table 2 compares the loan characteristics of the three vintages. The 2000 vintage has much lower
average loan amount but much higher average coupon rate. Mortgage spread is defined as
difference between the mortgage coupon rate and comparable maturity Treasury rate14. The 2000
vintage has an average mortgage spread of 505 bps, while the 2003 vintage and the 2006 vintage
has an average mortgage spread of 339 bps and 343 bps, respectively. Apparently the relative
magnitude of the mortgage spread of the 2000 and 2003 vintages somehow reflects the
aforementioned default rate differences between these two vintages. However, this risk-return
relationship is not true when we compare the 2006 vintage with the 2003 vintage – while they
have similar average mortgage spread, the 2006 vintage have over 3 times higher cumulative
default rate than the 2003 vintage over a three-year window. This finding concurs the so called
“default shock” – lenders and investors found several times higher default rate than expected
during the housing and subprime mortgage crisis.
Average FICO score improves over time. In fact, the average FICO scores of the 2003 and 2006
vintages both exceed 620, the traditional FICO score cutoff for prime mortgage loans. This
pattern is consistent with many anecdotal evidences that subprime lending became more for non-
credit reasons as the market evolved. This observation is also supported by the increases of
borrowers having low/no documentation. In 2000, only 19 percent of subprime loans have
low/no doc, while the percentage of low/no doc increased to 32 and 28 percent, respectively, in
2003 and 2006.
10-year Treasury rate for FRM 30 and 7-year Treasury rate for FRM 15.
Combined LTV also increases monotonically from 2000 to 2006. In fact, table 3 shows that the
2006 vintage has substantially higher proportion of high LTV loans. Nearly 18 percent of loans
originated in 2006 have LTV higher than 97 percent, while that number is less than 3 percent for
the 2000 vintage. Proportion of less risky 15-year loans decreases over time. In 2000, 19 percent
of subprime FRMs are 15-year, while in 2006 this number becomes only 5 percent. Loan
purpose compositions also vary over time. Percentage of loans as rate/term refinance loans is 10
percent in 2000. It increased to 18 percent in 2003 and then fell back to 10 percent in 2006. We
also notice that a large proportion of the 2003 vintage were originated by mortgage brokers or
correspondent lenders. Prepayment penalty prevails in all of the three vintages.
Table 4 further presents a comparison of the time-varying covariates of the three vintage loans.
The most significant difference comes from HPI growth. The 2000 and 2003 vintages
experienced an average HPI growth of 7 percent and 14 percent, respectively. In contrast, the
2006 vintage had an average HPI decline of 4 percent. Correspondently, the average negative
equity of the 2006 vintage is much higher than those of the 2000 and 2003 vintages. Both the
2000 and 2006 vintage loans experienced an average 1 percentage point increase in
unemployment rate, while the 2003 vintage had decline in unemployment rate (improvement in
4. Model Estimation and Tests of Model Stability
4.1. Model estimation
Both the hazard model and the Logit model are estimated using the maximum likelihood
estimation methods as discussed in Clapp, Deng and An (2006).
Table 5 reports our hazard model estimates on the three separate samples, which are constructed
based on the event-history data of the three vintage loans. The first column of the coefficients is
for the 2000 vintage sample. Most of the estimates are conforming to our expectation. For
example, default probability decreases with FICO score. The higher the original loan balance, the
lower the likelihood that the loan will default post-origination, everything else equal. Low/no
doc loans, investment property loans, and loans with prepayment penalty all have higher default
risk than their reference groups, respectively. 15-year FRM and condo loans have lower default
risk. Backend ratio is marginally significant with the expected sign of coefficient. Those loans
with original LTV higher than 80 percent do not show a significant different default risk than
those with LTV lower than or equal to 80 percent. We do see a significant positive relationship
between negative equity and default probability – the larger the negative equity, the more likely
the loan will default. Interestingly, Excess premium is significantly related to default probability,
which supports the notion that loan originators do possess valuable private information regarding
loan default risk and they incorporate that information in loan pricing.
The 2003 vintage estimates show more significant default risk factors. For example, backend
ratio is now highly significant with the expected sign of coefficient. Loans with higher than 80
percent original LTV are riskier than those with original LTV less than or equal to 80 percent,
everything else equal. 2- to 4-unit property loans have higher risk than 1-unit loans. Both
rate/term refinance and cash out refinance loans are shown to be less risky than home purchase
loans. In addition, broker/correspondent loans tend to be riskier. Change in unemployment rate
becomes a significant risk factor with the expected impact. FICO score, log loan balance, low/no
doc, 15-year FRM, condo loan, investment property, Excess premium and negative equity
continue to be significant and have the same signs of coefficient with those of the 2000 vintage
The significant default risk factors of the 2006 vintage are similar to those of the 2003 vintage
except that LTV greater than 80 percent, condo loan, and broker/correspondent loan become
marginally significant. However, we notice that the magnitude of many risk factors is very
different from those of the 2003 vintage estimates.
In table 6, we present our estimates of the Logit model. Here we are concentrating on default
probability and thus prepayment and censor observations are counted as non-default and a binary
Logit model is estimated. First, we notice that the estimates of all the three vintage models are
similar with those of the hazard model estimates reported in table 5. Second, comparing the
estimates across the three vintage samples, the patterns are also similar with those discussed
4.2. Tests of parameter stability
To formally assess whether parameters estimated with the three separate samples are statistically
different, we conduct Wald tests as discussed in Andrews and Fair (1988). Basically, denote
and * as true parameters of any two models (based on two different vintage loans), and and
as their estimates. We test the following hypothesis:
H0 : *
The Wald statistic is:
W ' var var
* * *
Under the null hypothesis, the Wald test statistic should be 2 distributed with a degree of
freedom equal to the number of parameters in the model (number of rows in the first or third
matrix in equation (4)).
Wald test results of the hazard model are reported in table 5 to the side of the estimates. Moving
from the 2000 vintage model to the 2003 vintage model, a number of parameters are statistically
different: default probability of the 2003 vintage are more sensitive than that of the 2000 vintage
to FICO score and log loan balance, as the magnitude of those two coefficients are significantly
larger in the 2003 vintage model than in the 2000 vintage model; interestingly, higher than 80
percent LTV loans have significantly higher default risk in the 2003 vintage model but not in the
2000 vintage model. This may be due to the fact that when more subprime loans are available,
higher risk borrowers self-select into high LTV loans. Rate/term refinance and cash out refinance
also become significant in the 2003 vintage model, which could be due to relatively worse
performance of the home purchase loans in the 2003 vintage; however, three of the prepayment
penalty dummy variables become insignificant; finally, the sensitivity of default probability to
Excess premium declines significantly, which is consistent with findings in An, Yao and
Rosenblatt (2010) that Excess premium becomes less predictive of default possibly due to
subprime lenders’ decreasing effort to collect soft information when originate-to-distribute
Comparing the parameters of the 2003 and 2006 model, again we see significant parameter
instability. The sensitivity of default probability to FICO score and change in unemployment rate
becomes smaller in the 2006 vintage model, while LTV greater than 80 percent, condo loan,
broker/correspondent loan become insignificant in the 2006 vintage model. The most remarkable
changes come from log loan balance and negative equity. Log loan balance is negatively
associated with default probability in the 2003 vintage model but it becomes positively related to
default probability in the 2006 vintage model. Negative equity coefficient does not change sign
but the magnitude in the 2006 vintage model is more than three times higher than that in the
2003 vintage model. In other words, the 2006 vintage subprime borrowers are much more
sensitive to negative equity in their default decisions. This is in fact quite intuitive: some
borrowers might not choose to default when house price is on the rise even if they had some
negative equity in their houses; but many borrowers might have chosen to default when house
price was falling even if they only had small negative equity in their houses.
Wald test results on the Logit models are reported in table 6. They are very similar to the
aforementioned results on the hazard models. A number of parameters are unstable over time but
the most instability comes from the negative equity variable. Borrowers become much more
sensitive to negative equity (decline in house price) in default during the crisis. Apparently,
when house price dropped dramatically during the crisis, this increased sensitivity made things
worse as they multiply to the increase negative equity to cause more defaults.
5. Default Shock and the Subprime Mortgage Crisis
Econometric default risk models rely heavily on historical data. Mortgage lenders and investors
typically use mortgage loan performance observed in previous periods to estimate how certain
default risk factors such as house price appreciation affects mortgage default probability and loss
severity. Such models are then used to predict future default losses under simulated paths of
house price appreciation.15 One can imagine that if models are unstable over time, even with the
most accurate predictions of the risk factor dynamics, default probability (loss) predictions will
be significantly off the target.
The subprime mortgage crisis is characterized by an unusually large fraction of subprime
mortgage loans originated during 2005-2007 turning into default during 2007-2009. This high
wave of default comes as a shock to many lenders, investors and rating agencies. Evidenced in
the previous analysis, the 2006 vintage subprime mortgage loans were originated with very
similar mortgage spreads with those of the 2003 vintage; however the ex post default rate of the
2006 vintage is over three times higher than that of the 2003 vintage.
In this section, we conduct a simple econometric experiment to decompose this “default shock”,
which is to see how much of the default rate “surprise” is due to the unprecedented house price
drop (HPI input error) and how much of the surprise is due to the changing sensitivity of the
parameters (parameter instability).
Notice that the subprime mortgage market started to explode in 2003 while default rates of
subprime loans really take off in 2006. Therefore, using the 2003 vintage model to predict the
2006 vintage data will be an interesting experiment regarding model instability. We obtain the
parameter estimates from the 2003 vintage sample and use them as default risk factor loadings16.
In the first experiment, we use the actual subsequent values of the default risk factors for the
2006 vintage loans, together with parameters estimated based on the 2003 vintage sample to
predict default rate of the 2006 vintage. This experiment tells us that if we have perfect
prediction of the default risk factors how accurate we can predict default probability. Notice that
Those predictions together with scenario analysis and sensitivity analysis are then used to assist mortgage
underwriting, pricing and risk management.
We set the insignificant parameters to zero because they are statistically indifferent from zero.
this is not a completely feasible forward-looking prediction but it separates the parameter
instability problem from model input error problem.
With both the hazard model and the Logit model, we make quarter-by-quarter predictions.
Figure 2 plots the predicted cumulative default rates by the two models in contrast with the
actual cumulative default rate. We see that while the two models have very similar predictions,
both under-predict defaults remarkably. Table 7 simply presents the aggregate results. Again,
both the hazard model and the Logit model under-predict the default probability of the 2006
vintage loans. While the actual cumulative default rate of the 2006 vintage loans is 22.2 percent
in a three-year window, our hazard model prediction is only 13.3 percent and our Logit model
prediction is only 13.0 percent. Normalized by the actual default rate, we can see from figure 3
that the hazard model predicts about 40% fewer defaults than the actual results while the Logit
model predicts about 41% fewer defaults.
Prior to the crisis, the predicted future house price path was probably much higher than the actual
subsequent price path. Therefore in our second experiment, in addition to using the 2003 vintage
model estimates to predict default of the 2006 vintage, we assume a naïve house price model –
the one that predicts the HPI growth rate in each zip-code during 2006-2008 remains the same
with that of 2003-2005. In so doing, we are able to see a combined impact of parameter
instability and bad HPI forecast.
Again, table 8 presents the cumulative predicted defaults in a three-year window and compares
them to the actual figures. Both models predict less than 11 percent of defaults while the actual
default rate is about 22 percent. Therefore, the combined impact of parameter instability and bad
HPI forecast is larger than the sole impact of parameter instability: it causes over 50 percent
fewer defaults than the actual results. State differently, the actual default rate is over twice higher
than the predicted default rate. However, an interesting observation from a comparison of table
7 and table 8 is that the marginal impact of the bad HPI forecast is much smaller than that of the
parameter instability (prediction accuracy comes down from 60 percent to 48 percent in contrast
to from 100 percent to 60 percent). This may help explain why the default wave came as a
“surprise” even though many lenders and investors conducted scenario analysis and some of
them might have already predicted much lower HPI growth for the 2006-2008 period – using a
wrong model is more detrimental than applying an unrealistic HPI growth.
6. Conclusions and Discussions
The subprime mortgage market has experienced an explosive development in the early- and mid-
2000s and then collapsed in 2007. During the past three years, massive defaults of subprime
mortgage loans have caused catastrophic losses in the financial markets. Much of the default loss
came as a shock to the investment community, as evidenced either from the non-proportionate
mortgage spreads charged by lenders at loan origination or from the large scale of write down
mortgage lenders and investors conduct on their mortgage assets during the crisis. This has
spurred retrospection on what went wrong with the risk management models. Following this
spirit, we investigate the stability of econometric default risk models and conduct econometric
experiments to examine to what extent the model instability explains the default shock.
Estimating separate hazard and Logit models for three vintage loans, all with a three-year
observation window, we find that the prevailing econometric mortgage default probability
models can be highly unstable over time. We find that not only the default risk factors such as
HPI growth are significantly different across the three vintages, coefficients of a number of
variables especially that of the negative equity variable are significantly different in those three
vintage models. Comparing the 2003 vintage loans with the 2006 vintage loans, the 2003 vintage
have experienced the highest house price run-up in the history within three years of their
origination, while those loans originated in 2006 were exposed to an unprecedented house price
decline during 2006-2008. Meanwhile, default probability of the 2006 vintage loans are over
three times more sensitive than that of the 2003 vintage to house price change.
Our simulation suggests that both the hazard model and the Logit model estimated with 2003
vintage data under-predict the default probability of the 2006 vintage loans. Assuming a perfect
forecast of HPI and other default risk factors, the hazard model predicts about 40 percent fewer
defaults than the actual results while the Logit model predicts about 41 percent fewer defaults.
When house price forecasting is not accurate, we see a more severe under-prediction. Assuming
a naïve house price prediction, the two econometric models under-predict over 50 percent of the
The findings in this paper have a number of implications. First, we have to exercise extra caution
explaining and applying empirical results based on historical data, especially those non-
representative ones. The house price run-up during 2003-2006 was atypical. If we were just to
use data during the atypical period in default risk forecasting we would obtain exceptional results,
as we show in this paper. It is definitely not an easy task to identify the non-representative data
ex ante. Remedies to that problem include using larger sample and longer history, and adding
scrutiny to every data we analyze. Second, judged by the aggregate post-sample prediction
accuracy, we need improvements in default risk models as well as house price forecasting
models. Certainly, the current paper does not explore the optimal specification within the current
hazard or Logit model framework and we do believe improvements can be made in that regard.
However, models with more “structural framework” may be more promising. For example, as
many people believe that we have had regime shifts in the mortgage and housing market, models
that can capture those regime shifts many help improve our ability to forecast mortgage default
risk. Third, default risk models can be misleading if used inappropriately and model risk has to
be understood in risk management operations. Model limitations may be masked by other factors
during normal times but when there is structural change that leads to different data generating
mechanism model risk can become most significant and costly. Fourth, from a regulation
perspective, the Basel II regulation framework should be reformed to address the credit cycles
and avoid the pro-cyclicality of usual risk assessment models. Finally, economic capital is
important to mortgage bankers and to the investment community. In that regard, again a
technical problem will be how to get around the pro-cyclicality of usual risk management models.
Agarwal, Sumit, Brent W. Ambrose, Souphala Chomsisengphet and Anthony B. Sanders. 2009. The
Neighbor’s Mortgage: Does Living in a Subprime Neighborhood Impact your Probability of Default?
SSRN working paper.
Alexander William P., Scott D. Grimshaw, Grant R. McQueen and Barrett A. Slade. 2002. Some Loans Are
More Equal than Others: Third-Party Originations and Defaults in the Subprime Mortgage Industry. Real
Estate Economics 30(4), 667-697.
Ambrose, B. W. and A. B. Sanders. 2003. Commercial Mortgage Backed Securities: Prepayment and Default.
Journal of Real Estate Finance and Economics, 26(2/3): 179-196.
An, Xudong, John C. Clapp and Yongheng Deng. 2010. Omitted Mobility Characteristics and Property
Market Dynamics: Application to Mortgage Termination. Journal of Real Estate Finance and Economics
An, Xudong, Yongheng Deng and Stuart A. Gabriel. 2010. Asymmetric Information, Adverse Selection, and
the Pricing of CMBS. Journal of Financial Economics, forthcoming.
An, Xudong, Yongheng Deng and Anthony B. Sanders. 2009. Default Risk of CMBS Loans: What Explains
the Regional Variations? National University of Singapore, IRES Working Paper 2009-009.
Andrews, Donald W. and Ray C. Fair. 1988. Inference in Nonlinear Econometric Models with Structural
Change. Review of Economic Studies 55: 615-640.
Archer, W. R., D. C. Ling and G.. A. McGill. 1996. The Effect of Income and Collateral Constraints on
Residential Mortgage Terminations. Regional Science and Urban Economics, 26: 235-261.
Archer, W. R., D. C. Ling and G.. A. McGill. 1997. Demographic Versus Option-Driven Mortgage
Terminations. Journal of Housing Economics, 6(2): 137-163.
Calhoun, Charles, and Yongheng Deng. 2002. A Dynamic Analysis of Fixed- and Adjustable-Rate Mortgage
Terminations. Journal of Real Estate Finance and Economics, 24: 9-33.
Campbell, T. and J. K. Dietrich. 1983. The Determinants of Default on Conventional Residential Mortgages.
Journal of Finance, 48(5): 1569-1581.
Capozza, D. R., D. Kazarian, and T.A. Thomson. 1997. Mortgage Default in Local Markets. Real Estate
Economics, 25(4): 631-655.
Capozza, D. R., D. Kazarian, and T. A.Thomson. 1998. The Conditional Probability of Mortgage Default.
Real Estate Economics. 26(3): 359-390.
Chen, Jun and Yongheng Deng. 2003. Commercial Mortgage Workout Strategy and Conditional Default
Probability: Evidence from Special Serviced CMBS Loans. USC Lusk Center for Real Estate Working
Ciochetti, Brian A., Yongheng Deng, Gale Lee, James Shilling and Rui Yao. 2003. A Proportional
Hazards Model of Commercial Mortgage Default with Originator Bias. Journal of Real Estate Finance
and Economics 27(1), 5-23.
Clapp, John M., Yongheng Deng and Xudong An, 2006, Unobserved Heterogeneity in Models of Competing
Mortgage Termination Risks, Real Estate Economics 34(2), 243-273.
Clapp, J. C., G. M. Goldberg, J. P. Harding and M. LaCour-Little. 2001. Movers and Shuckers:
Interdependent Prepayment Decisions. Real Estate Economics, 29(3): 411-450.
Clauretie, T. M. 1987. The Impact of Interstate Foreclosure Cost Differences and the Value of Mortgages on
Default Rates. Journal of the American Real Estate and Urban Economics Association, 15(3): 152-67.
Cunningham, D. F. and C. A. Capone, Jr. 1990. The Relative Termination Experience of Adjustable to Fixed-
Rate Mortgages. Journal of Finance, 45(5): 1687-1703.
Deng, Yongheng, 1997. Mortgage Termination: An Empirical Hazard Model with Stochastic Term
Structure. Journal of Real Estate Finance and Economics, 14 (3), 309-331.
Deng, Yongheng, John M. Quigley and Robert Van Order. 1996. Mortgage Default and Low Down-payment
Loans: The Cost of Public Subsidy. Regional Science and Urban Economics, 26: 263-285.
Deng, Yongheng, John M. Quigley and Robert Van Order. 2000. Mortgage Terminations, Heterogeneity and
the Exercise of Mortgage Options. Econometrica, 68(2): 275-307.
Deng, Yongheng and John M. Quigley. 2002. Woodhead Behavior and the Pricing of Residential Mortgages.
Lusk Center for Real Estate Working Paper, No. 2001-1005.
Deng, Yongheng, and Stuart A. Gabriel. 2006. Risk-Based Pricing and the Enhancement of Mortgage Credit
Availability among Underserved and Higher Credit-Risk Populations. Journal of Money, Credit and
Banking, 38 (6), 1431-1460.
Demyanyk, Yuliya S. and Van Hemert, Otto. 2009. Understanding the Subprime Mortgage Crisis. Review of
Financial Studies, forthcoming.
Elul, Ronel. 2009. Securitization and Mortgage Default: Reputation vs. Adverse Selection. SSRN working
Episcopos, A., A. Pericli, and J. Hu. 1998. Commercial Mortgage Default: A Comparison of Logit with
Radial Basis Function Networks. Journal of Real Estate Finance and Economics, 17(2):163-178.
Feldman, David and Shulamith Gross. 2005. Mortgage Default: Classification Trees Analysis. Journal of
Real Estate Finance and Economics 30(4), 369-396.
Follain, J. and R. Struyk. 1977. Homeownership Effects of alternative Mortgage Instruments. Journal of the
American Real Estate and Urban Economics Association, 5(1): 1-43.
Foster, C. and R. Van Order. 1984. An Option-Based Model of Mortgage Default. Housing Finance Review,
Foster, C., and R. Van Order. 1985. FHA Terminations: A Prelude to Rational Mortgage Pricing. Journal of
the American Real Estate and Urban Economics Association, 13:292-316.
Gerardi, Kristopher, Adam Hale Shapiro and Paul S. Willen. 2008. Subprime Outcomes: Risky Mortgages,
Homeownership Experiences, and Foreclosures. Federal Reserve Bank of Boston working paper.
Green, J. and J. B. Shoven. 1986. The Effect of Interest Rates on Mortgage Prepayment. Journal of Money,
Credit and Banking 18, 41-50.
Green, Richard K., Eric Rosenblatt and Vincent Yao. 2010. Sunck Costs and Mortgage Default. SSRN
Haughwout, Andrew, Ebiere Okah and Joseph Tracy. 2009. Second Chances: Subprime Mortgage
Modification and Re-Default. Federal Reserve Bank of New York Staff Report.
Hendershott, P. H., and W. R. Schultz. 1993. Equity and Nonequity Determinants of FHA Single-Family
Mortgage Foreclosures in 1980s. Journal of American Real Estate and Urban Economics Association.
Herzog, J. and J. Earley. 1970. Home Mortgage Delinquency and Foreclosure. New York: National Bureau
of Economic Research.
Jackson, J. and D. Kaserman. 1980. Default Risk on Home Mortgage Loans: A Test of Competing
Hypotheses. Journal of Risk and Insurance, 4: 678-690.
Kau, J. B., D. C. Keenan and T. Kim. 1994. Default Probabilities for Mortgages. Journal of Urban
Economics, 35: 278-296.
Kelly, Austin. 2009. Skin in the Game: Zero Down Payment Mortgage Default. Journal of Housing Research,
17 (2), 75-99.
Keys, Benjamin, Tanmoy Mukherjee, Amit Seru and Vikrant Vig. 2010. Did Securitization Lead to Lax
Screening? Evidence from Subprime Loans. Quarterly Journal of Economics 125 ( 1), 307-362.
Lucas, Robert. 1976. Econometric Policy Evaluation: A Critique. In Brunner, K. and A. Meltzer, The Phillips
Curve and Labor Markets, Carnegie-Rochester Conference Series on Public Policy 1: 19-46. New York:
Mian, Atif and Amir Sufi. 2009. The Consequences of Mortgage Credit Expansion: Evidence from the U.S.
Mortgage Default Crisis. Quarterly Journal of Economics, 124 (4), 1449-1496.
Morton, T. G. 1975. A Discriminant Function Analysis of Residential Mortgage Delinquency and
Foreclosure. Journal of the American Real Estate and Urban Economics Association, 3(1): 73-90.
Pennington-Cross, Anthony. 2003. Credit History and the Performance of Prime and Nonprime Mortgages.
Journal of Real Estate Finance and Economics 27(3), 279-301.
Philips, R.A., E. Rosenblatt and J.H. VanderHoff. 1996. The Probability of Fixed and Adjustable Rate
Mortgage Termination. Journal of Real Estate Finance and Economics 13(2): 95–104.
Philips, R. A. and J. H. VanderHoff. 2004. The Conditional Probability of Foreclosure: An Empirical
Analysis of Conventional Mortgage Loan Defaults. Real Estate Economics, 32(4): 571-587.
Quigley, John M. 1987. Interest Rate Variations, Mortgage Prepayments and Household Mobility. Review of
Economics and Statistics 69(4), 636-643.
Quigley, John M. and Robert Van Order. 1991. Defaults on Mortgage Obligations and Capital Requirements
for U.S. Savings Institutions: A Policy Perspective. Journal of Public Economics 44(3): 353-370.
Quigley, John M. and Robert Van Order. 1995. Explicit Tests of Contingent Claims Models of Mortgage
Default. Journal of Real Estate Finance and Economics, 1(2): 99–117.
Rajan, Uday, Amit Seru and Vikrant Vig. 2010. Statistical Default Models and Incentives. American
Economic Association Papers and Proceedings, 100 (2), 1-5.
Schwartz, Eduardo S. and Walter N. Torous. 1993. Mortgage Prepayment and Default Decisions: A Poisson
Regression Approach. Journal of the American Real Estate and Urban Economics Association, 21(4): 431-
Vandell, K. D. 1978. Default Risk under Alternative Mortgage Instruments. Journal of Finance, 33(5): 1279–
Vandell, Kerry D. and T. Thibodeau. 1985. Estimation of Mortgage Defaults Using Disagregate Loan History
Data. Journal of the American Real Estate and Urban Economics Association. 13(3): 292-316.
Vandell, Kerry D. 1993. Handing Over the Keys: A Perspective on Mortgage Default Research. Journal of
the American Real Estate and Urban Economics Association. 21, 211-246.
Van Order, Robert. 1990. The Hazards of Default. Secondary Mortgage Markets. 1990 (fall): 29-31.
von Furstenberg, G. 1969. Default Risk on FHA-Insured Home Mortgage as a Function of the Term of
Financing: A Quantitative Analysis. Journal of Finance, 24(2): 459-77.
von Furstenberg, G. 1970a. Interstate Differences in Mortgage Renting Risks: An Analysis of Causes.
Journal of Financial and Quantitative Analysis, 5: 229-42.
von Furstenberg, G. 1970b. The Investment Quality of Home Mortgages. Journal of Risk and Insurance, 37
von Furstenberg, G. and R.J. Green. 1974. Home Mortgages Delinquency: A Cohort Analysis. Journal of
Finance, 29(4): 1545-48.
Webb, B.G. 1982. Borrower Risk under Alternative Mortgage Instruments. Journal of Finance, 37 (1): 169-
Williams, A. O., W. Beranek and J. Kenkel. 1974. Default Risk in Urban Mortgages: A Pittsburgh Prototype
Analysis. Journal of the American Real Estate and Urban Economics Association, 2(2): 101-2.
Yezer, Anthony M. J., Robert F. Phillips and Robert P. Trost. 1994. Bias in Estimates of Discrimination and
Default in Mortgage Lending: The Effects of Simultaneity and Self-Selection. Journal of Real Estate
Finance and Economics 9, 197-215.
Zorn, Peter and Michael Lea. 1989. Mortgage Borrower Repayment Behavior: A Microeconomic Analysis
with Canadian Adjustable Rate Mortgage Data. Journal of the American Real Estate and Urban Economics
Association, 17(1): 118-136.
1 2 3 4 5 6 7 8 9 10 11 12
Loan age (Quarter)
Figure 1: Cumulative default rates of the three vintage loans
Actual cumulative default rate
Hazard model prediction
Logit model prediction
1 2 3 4 5 6 7 8 9 10 11
Loan age (quarter)
Figure 2: Predicted cumulative default rates of the 2006 vintage loans
Actual default Prediction Prediction Actual default Prediction Prediction
with actualwith naïve HPI with actualwith naïve HPI
Hazard model Logit model
Figure 3: Model predicted defaults as a percentage of actual defaults, 2006 vintage loan
Table 1: Performances of the three vintage loans
2000 2003 2006
Number Percentage Number Percentage Number Percentage
Default 1,396 16.36 2,255 7.08 5,969 22.21
Prepayment 2,571 30.13 14,440 45.36 4,868 18.11
Censor 4,566 53.51 15,141 47.56 16,039 59.68
Total 8,533 100.00 31,836 100.00 26,876 100.00
Note: Default is defined as over 90-day delinquency. Censor means that the loan is alive at the data cutoff
point, which is 2002Q4, 2005Q4 and 2008Q4 for the three vintages, respectively.
Table 2: Comparison of the loan characteristics of the three vintages
Mean Standard Deviation
2000 2003 2006 2000 2003 2006
Original loan amount ($) 90,568 161,410 170,762 74,401 105,792 119,415
Coupon rate 0.11 0.07 0.08 0.02 0.01 0.01
Mortgage spread (%) 5.05 3.39 3.43 1.58 1.21 1.29
FICO score 602 638 626 63.88 64.45 61.62
Backend ratio 0.38 0.38 0.39 0.11 0.10 0.10
Combined LTV 76.28 78.93 79.38 15.55 15.24 17.42
LTV>80% 0.37 0.42 0.36 0.48 0.49 0.48
Low/No doc 0.19 0.32 0.28 0.39 0.47 0.45
Jumbo size loan 0.03 0.07 0.04 0.17 0.26 0.20
30-year FRM 0.81 0.89 0.95 0.39 0.31 0.23
15-year FRM 0.19 0.11 0.05 0.39 0.31 0.23
1-unit property 0.90 0.89 0.92 0.30 0.31 0.27
2- to 4-unit property 0.07 0.06 0.04 0.25 0.24 0.19
Condo 0.03 0.05 0.04 0.18 0.21 0.20
Rate/term refinance 0.10 0.15 0.10 0.30 0.36 0.30
Cash out refinance 0.65 0.65 0.67 0.48 0.48 0.47
Home purchase 0.25 0.20 0.22 0.44 0.40 0.42
Owner-occupied home 0.88 0.89 0.92 0.33 0.31 0.27
Second/vacation home 0.01 0.01 0.01 0.10 0.09 0.10
Investment property 0.11 0.10 0.07 0.32 0.30 0.26
Broker/correspondent loan 0.04 0.19 0.06 0.20 0.39 0.23
Retail loan 0.03 0.10 0.02 0.17 0.30 0.13
Prep penalty 1-year 0.05 0.08 0.05 0.21 0.26 0.23
Prep penalty 2-year 0.02 0.04 0.03 0.12 0.19 0.18
Prep penalty 3-year 0.21 0.50 0.54 0.41 0.50 0.50
Prep penalty over 3-year 0.26 0.08 0.10 0.44 0.28 0.30
Number of loans 8,533 31,836 26,876
Table 3: Combined LTV distributions of the three vintages
Combined LTV (%)
(0,60) [60,70) [70,75) [75,80) [80,85) [85,90) [90,95) [95,97) [97,100) [100,~) Total
2000 16.45 13.08 11.16 23.39 13.08 14.33 5.06 1 1.89 0.56 100.00
2003 12.3 12.85 8.9 21.46 10.74 16.92 8.61 0.34 7.65 0.23 100.00
2006 12.44 11.1 7.64 18.11 10.09 14.62 8.06 0.31 17.51 0.12 100.00
Table 4: Comparison of the time varying covariates of the three vintages
Mean Std Dev Minimum Maximum
2000 2003 2006 2000 2003 2006 2000 2003 2006 2000 2003 2006
HPI growth since
0.07 0.14 -0.04 0.07 0.14 0.13 -0.15 -0.16 -1.07 0.66 0.79 0.37
-0.48 -0.62 -0.32 1.16 0.73 0.74 -58.91 -27.16 -56.15 0.21 0.13 0.65
0.01 0.02 0.01 0.03 0.04 0.02 -0.16 -1.52 -0.10 0.21 0.40 0.16
Change in MSA-level
unemployment rate 0.01 -0.01 0.01 0.01 0.01 0.01 -0.12 -0.11 -0.10 0.08 0.15 0.14
62,025 213,701 192,940
Note: Negative equity is calculated with the contemporaneous house value (based on zip code level HPI) and the market value of the mortgage
loan outstanding. See Clapp, Deng and An (2006) for more details.
Table 5: Hazard model parameter estimates and Wald test results of the three vintage loans
Coefficient (S.E.) Wald Statistics
2000 2003 2006 2000-2003 2003-2006
FICO score -0.548*** -0.719*** -0.627*** 17.49*** 9.37**
(0.032) (0.026) (0.016)
Backend ratio 0.052 0.079*** 0.068*** 0.62 0.21
(0.027) (0.022) (0.014)
Log of original loan balance -0.074* -0.213*** 0.031* 12.13*** 72.19***
(0.031) (0.025) (0.015)
LTV>80% -0.096 0.321*** 0.042 24.96*** 23.03***
(0.066) (0.05) (0.029)
Low/No doc 0.324*** 0.451*** 0.448*** 2.22 0.00
(0.071) (0.048) (0.029)
15-year FRM -0.532*** -0.511*** -0.327*** 0.03 2.38
(0.095) (0.083) (0.086)
2- to 4-unit property 0.196 0.203* 0.138* 0.00 0.32
(0.106) (0.091) (0.069)
Condo -0.399* -0.579*** -0.098 0.55 9.12**
(0.197) (0.145) (0.066)
Rate/term refinance 0.009 -0.362*** -0.485*** 9.14** 1.97
(0.1) (0.071) (0.05)
Cash refinance -0.113 -0.37*** -0.427*** 8.95** 0.83
(0.067) (0.054) (0.032)
Second/vacation home -0.178 -0.281 0.185 0.06 2.17
(0.306) (0.29) (0.124)
Investment property 0.446*** 0.277*** 0.413*** 2.26 2.24
(0.085) (0.074) (0.052)
Broker/correspondent loan 0.118 0.115* -0.097 0.00 7.39**
(0.126) (0.055) (0.056)
Prep penalty 1-year 0.207 0.192* 0.275*** 0.01 0.55
(0.128) (0.094) (0.06)
Prep penalty 2-year 0.447* -0.098 0.079 5.75* 1.7
(0.195) (0.117) (0.069)
Prep penalty 3-year 0.177* -0.049 -0.007 7.01** 0.53
(0.071) (0.049) (0.033)
Prep penalty over 3-year 0.154* -0.082 0.076 4.89* 2.71
(0.067) (0.083) (0.048)
Excess premium 0.423*** 0.3*** 0.233*** 14.32*** 9.65**
(0.027) (0.018) (0.013)
Negative equity 0.281** 0.247*** 0.78*** 0.13 75.12***
(0.087) (0.042) (0.045)
Change in unemployment rate 0.041 0.194*** 0.138*** 17.97*** 10.4**
(0.034) (0.011) (0.014)
N 62,025 213,701 192,940
-2LogL 23,535 42,941 113,793
Note: *, ** and *** indicate significant at 0.05, 0.01 and 0.001 level, respectively. The baseline estimates
are not shown in this table.
Table 6: Logit model parameter estimates and Wald test results of the three vintage loans
Coefficient (S.E.s) Wald Statistics
2000 2003 2006 2000-2003 2003-2006
FICO score -0.561*** -0.733*** -0.649*** 16.91*** 7.44**
(0.033) (0.026) (0.017)
Backend ratio 0.053 0.082*** 0.07*** 0.65 0.2
(0.028) (0.022) (0.014)
Log of original loan balance -0.077* -0.218*** 0.032* 12.06*** 72.95***
(0.032) (0.025) (0.015)
LTV>80% -0.096 0.332*** 0.043 25.53*** 23.8***
(0.068) (0.051) (0.03)
Low/No doc 0.333*** 0.46*** 0.462*** 2.14 0.00
(0.072) (0.048) (0.03)
15-year FRM -0.541*** -0.528*** -0.332*** 0.01 2.65
(0.097) (0.084) (0.087)
2- to 4-unit property 0.201 0.2* 0.142* 0.00 0.25
(0.109) (0.092) (0.072)
Condo -0.405* -0.582*** -0.098 0.51 8.99**
(0.199) (0.146) (0.068)
Rate/term refinance 0.01 -0.361*** -0.503*** 8.78** 2.55
(0.102) (0.073) (0.052)
Cash refinance -0.115 -0.375*** -0.445*** 8.87** 1.17
(0.068) (0.055) (0.033)
Second/vacation home -0.183 -0.283 0.201 0.05 2.29
(0.31) (0.292) (0.129)
Investment property 0.46*** 0.278*** 0.429*** 2.51 2.64
(0.087) (0.076) (0.053)
Broker/correspondent loan 0.119 0.117* -0.1 0.00 7.37**
(0.129) (0.056) (0.058)
Prep penalty 1-year 0.207 0.202* 0.285*** 0.00 0.53
(0.13) (0.096) (0.063)
Prep penalty 2-year 0.453* -0.103 0.093 5.72* 2.01
(0.2) (0.119) (0.072)
Prep penalty 3-year 0.177* -0.055 -0.007 7.05** 0.65
(0.072) (0.049) (0.034)
Prep penalty over 3-year 0.154* -0.081 0.079 4.68* 2.68
(0.069) (0.084) (0.05)
Excess premium 0.433*** 0.31*** 0.242*** 13.93*** 9.18**
(0.028) (0.018) (0.013)
Negative equity 0.285** 0.24*** 0.792*** 0.21 78.43***
(0.088) (0.042) (0.046)
Change in unemployment rate 0.041 0.221*** 0.147*** 22.8*** 13.93***
(0.035) (0.014) (0.014)
N 62,025 213,701 192,940
-2LogL 12,470 22,609 48,759
Note: *, ** and *** indicate significant at 0.05, 0.01 and 0.001 level, respectively. The baseline estimates
are not shown in this table.
Table 7: Impact of parameter instability on default prediction
Hazard model prediction Logit Model prediction
Number Percentage Number Percentage
Predicted default 3,579 13.32 3,507 13.05
Actual default 5,969 22.21 5,969 22.21
Sample size 26,876 100.00 26,876 100.00
Note: These are cumulative (predicted and actual) defaults of the 2006 vintage loan. The prediction is
based on the model estimated with 2003 vintage data and the actual realization of the 2006 vintage
Table 8: Combined impact of parameter instability and HPI input error on default
Hazard model prediction Logit Model prediction
Number Percentage Number Percentage
Predicted default 2,852 10.61 2,931 10.91
Actual default 5,969 22.21 5,969 22.21
Sample size 26,876 100.00 26,876 100.00
Note: These are cumulative (predicted and actual) defaults of the 2006 vintage loan. The prediction is
based on the model estimated with 2003 vintage data and assumes the 2006 vintage loan has the same zip
code-level HPI growth during 2006-2008 with that of the 2003 vintage loan during 2003-2005.