Docstoc

Building_Credit_Scoring_Models

Document Sample
Building_Credit_Scoring_Models Powered By Docstoc
					http://www.a mwizard.cn/creditscoring.ht ml#7.9



Building Credit Scoring Mode ls
with SAS Enterprise M iner
Although there are other ways to build credit scoring models, we st ill can f ind a good pattern in this pa per from SAS.

 Introduction
Over the past 30 years, growing de ma nd, stronger competition a nd a dvances in computer technology have meant that the
traditional methods of ma king credit decisions that re lie d most ly on human judgment have bee n replaced by methods that e mploy
statistical models. Statistical models today are used not only for deciding whether or not to acce pt an applicant (applicat io n
scoring), but a lso for predicting the like lihood of defaults a mong customers who have a lready been accepted (be haviora l scoring)
and for predicting t he likely a mount of debt that the lender ca n expect to recover (collection scoring).
The term credit scoring ca n be defined on several concept ual levels. Most funda mentally, credit scoring means applying a stat istical
mode l to assign a risk score to a credit applicat ion or to an existing credit account. On a hig her level, cre dit scoring also means t he
process of developing such a statistica l mode l from historica l data. On yet a higher level, the term also refers to monitorin g the
accuracy of one or many such statistical models and monitoring the effect that score -based decisions have on key business
performance indicators.

Credit scoring is performed beca use it provides a number of important business benef its, all of t he m based on the a bility to quickly
and efficiently obtain fact-based and accurate pre dictions of the credit risk of indiv idua l a pplica nts or customers. For exa mple, in
application scoring, cre dit scores are use d for optimizing the approval rate of cre dit applicat ions.

Application scores e nable an organization to choose a n optima l cut -off score for a cceptance, such that market share can be gained
while retaining max imum profitability. The approval process and the marketing of credit products can be strea mline d base d on cre dit
scores. For exa mple, high-risk applications can be given to more experie nced staff, or pre-approved cre dit products can be offered to
select low-risk custome rs via various channe ls, including direct marketing and the Web.

Credit scores, both of prospects a nd existing customers, are essent ial in the customization of credit pro ducts. They are used to
determine customer credit limits, down pay me nts, depos its a nd interest rates.

Behaviora l credit scores of ex isting customers are used in the early detection of high-risk accounts, and they e nable orga nizat ions to
perform targeted interventions, such as proact ively offering de bt restructuring. Behaviora l credit scores a lso form the basis for more
accurate ca lculations of the tota l consumer credit risk exposure, which can result in a reduction of bad debt provision.

Other benefits of cre dit scoring include an improved targeting of audits at high-risk accounts, thereby optimizing the workload of the
audit ing staff. Resources spe nt on de bt colle ction ca n be opt imized by target ing collect ion activ ities at accounts with a hig h colle ction
score. Collection scores are also used for determining the accurate value of a debt book before it is sold to a collect ion ag ency.
Finally, credit scores serve to assess the quality of portfolios intended for acquisition and to compare t he quality of bus iness from
different cha nnels, regions and suppliers.

 Building credit models in-ho use
While under certain circumstances it is a ppropriate to buy ―ready-made‖ generic credit mode ls from outside vendors or to have credit
mode ls deve loped by outside consultants for a specific purpose, maintaining a practice for building credit models in -house offers
several advantages.

Most direct ly, it e nables the le nding organization to prof it from economies of scale when many mode ls need to be built. It a lso
enables le nders to afford a greater number of segme nt- specif ic mode ls for a greater variety of purposes.

Building a solid, interna l skill base of its own a lso makes it easier for the organization to re ma in consistent in the interpretat ion of
mode l results a nd reports and to use a consistent mode ling methodology acros s the whole range of customer -re lated scores. This
results in a re duced turnaround t ime for the integration of new models, there by freeing resources to respond more swiftly to new
business questions with creat ive new models and strategies.

Finally, in-house modeling competency is nee ded to verify the accura cy and to a nalyze the strengths a nd weaknesses of acquired
credit mode ls, to reduce outsider access to strategic information and to retain compet itive advantage by building up compa ny -specif ic
best pract ices.

 Building credit models with SAS Enterprise Miner
SAS Enterprise Miner software is SAS’ solution for data mining. It is used a cross many industrie s to answer a variety of b usiness
questions and ha s bee n extended with specific functionality for credit scoring that is describe d in more detail in the case s t udy
section.

Building cre dit models with SAS Enterprise Miner offers a number of benefits. It enables the ana lyst to a cc ess a comprehe nsive
collection of data mining tools t hrough a graphica l user interface and to create proce ss flow dia gra ms that structure a nd doc ument
the flow of a nalytica l a ctivit ies. The various nodes that make up the process flow are designe d such that the analyst can interact with
data and models to bring in fully the domain expertise —i.e., use the software as a steering wheel and not as an auto-pilot. SAS


                                                                                                                                                       1
Enterprise Miner is idea l for testing ne w ideas and experiment ing with new modeling approaches in an efficient and controlled
ma nner. This includes the creation a nd comparison of various scorecard, de cision tree and neura l net work mode ls, to na me just a
few.

 SAS Enterprise Miner process flo w templates
SAS Enterprise Miner process flow diagra ms ca n serve as te mplates for imple me nting industry or company standards and best
practices. Such te mplates not only reduce the development t ime for new mode ls, but a lso e nsure consistency and an efficient transfer
of ability to new e mployees.

The process flow that is used in the case study in this pa per is ava ilable from SAS a nd can serve as a basic credit scoring t e mplate. It
enables the a nalyst to build a scoreca rd mode l that assigns score points to customer attributes, to use the Interactive Group ing node
to class and se lect characterist ics automat ica lly a nd/or interactive ly using We ights of Evidence and Information Value m easures, and
to normalize score points to conform to company or industry standards. As an alternat ive model type, the te mplate builds a de cision
tree.

 The larger credit scoring process
Mode ling is the process of creat ing a scoring rule from a set of exa mples. In order for modeling to be effective, it has to b e integrated
into a la rger process. Let’s look at application scoring. On the input side, before the mode ling step, the set of exa mple a pplicat ions
must be pre pared. On the output side, after t he mode ling, the scoring rule has to be executed on a set of ne w a pplicat ions, s o that
credit granting decisions can be made. The collect ion of performance data is at the be ginning and at the end of the credit scoring
process. Before a set of exa mple a pplicat ions can be prepared, performance data has to be colle cted so that a pplicat ions can be
tagged as ―good‖ or ―bad.‖ After new applicat ions have been scored and decided upon, the performance of the accepted a ccounts
again must be tracke d and re ports created. By doing so, the scoring rules ca n be validated and possibly subst ituted, the acce ptance
policy finely tuned and the current risk exposure calculated.

SAS power to access and transform data on a huge variety of syste ms ensures that mode ling with SAS Enterprise Miner smoothly
integrates into the larger credit scoring 1 process. SA S software is the idea l tool for building a risk data ware house. This is a subject-
2 oriented, integrated, time-variant and nonvolatile repos itory of informat ion that serves a s the integration hub for all risk
ma nage me nt-related decision-support processes, including scorecard monitoring reports a nd risk exposure calculat ions.

SAS Enterprise Miner creates portable scoring code that can be executed on a large variety of host syste ms. For exa mple, the scoring
code ca n be use d for scoring a large customer se gme nt centra lly in batches, or it can be int egrated into applications t hat score
individual applicants in branch offices.

 Choosing the right model
With SAS Enterprise Miner, it is possible to create a variety of model types, such as scorecards, decision trees or neura l networks.
When you evaluate which mode l type is best suited for achiev ing your goals, you may want to consider criteria such as t he eas e of
applying t he mode l, the ease of understanding it and the ease of justifying it.

At the sa me time, for each part icular mode l of whatever type, it is important to assess its predictive performance, such as t he
accuracy of the scores that the mode l assigns to the a pplicat ions. A variety of busine ss-relevant qua lity measures are used for this.
The best mode l will be determined both by t he purpose for which the mode l will be used and by the structure of the data set o n
which it is validated.

 Scorecards
The tra dit iona l form of a credit scoring model is a scorecard. This is a table that conta ins a number of questions that an a pplicant is
asked (ca lled cha racterist ics) and, for each quest ion, a list of possible answers (ca lled attributes). For exa mple, one chara cteristic may
be the a ge of t he applicant, and the attributes for this characteristic are then a numbe r of a ge ranges into which an applicant can fall.
For each answe r, the applicant receives a certa in number of points —more if the attribute is one of low risk, fewer if the risk is higher.
If the application’s total score exceeds a specified cut-off number of points, it is recomme nded for acce ptance.

Such a scorecard model, a part from be ing a long-establishe d method in the industry, still has several advantages when compared
with more rece nt ―data mining‖ types of mode ls, such as decision trees or neura l networks. To begin with, a score card is easy to
apply.

If needed, the scoreca rd can be evaluated on a sheet of paper in the presence of the applicant. The scorecard is a lso easy to
understand. The number of points for one a nswer doesn’t de pend on any of the other answers, and across the range of possible
answers for one quest ion, the number of points usually increases in a simple way (often monotonically, or even linearly). The refore, it
is often easy to justify to the applicant a de cision that is made on the basis of a scorecard. It is possible to disclose groups of
characterist ics where the a pplica nt has a potentia l for improv ing the score a nd to do so in broad e nough terms not to risk
ma nipulated future applications.

 Decision trees
On the other hand, a decision tree may outperform a scorecard in terms of predict ive accuracy, because unlike the scorecard, it
detects and exploits interact ions between characterist ics. In a decision tree model, each answer t hat an applicant gives determines
what question is asked next. If the a ge of a n applicant is, for exa mple, greater tha n 50, the mode l may suggest granting a cr edit
without any furt her questions, because the average bad rate of that segment of applications is sufficie ntly low. If, on the other
extreme, the age of the applicant is be low 25, the model may suggest asking about t ime on the job next. Then, credit might be

                                                                                                                                                    2
granted only to those that have exceeded 24 months of e mploy ment, be cause only in that sub-se gment of younger a dults is the
average bad rate sufficiently low.

Thus, a decision tree model consists of a set of ―if … the n … e lse‖ rules that are st ill quite straightforward to a pply. The decision rules
are also easy to understand, perha ps even more so than a decision rule t hat is base d on a total score that is made up of ma ny
components. However, a de cision rule from a tree mode l, while easy to apply and understand, may be h ard to just ify for applications
that lie on the border bet ween t wo se gments. There will be case s where a n applicant will, for exa mple, say: ―If I ha d only be en t wo
months older, I would have re ceived credit without further quest ions, but now I a m asked for addit iona l securit ies. That is unfair.‖
That applicant may a lso be te mpted to make a false state ment about his or her age in the next application, or simply go else w here
for financia l services.

Even if a de cision tree is not used direct ly for scoring, this mode l type still adds value in a number of ways. The identif ication of
clearly define d segments of applicants with a particularly high or low risk can give dra matic ne w insight into the risk struc ture of the
entire customer population. Decision trees are a lso used in scorecard monitoring, where they ide ntify segme nts of applicat ions where
the scorecard underpe rforms.

 Neural net works
With t he de cision tree, we could see that there is such a t hing as a decision rule that is too easy to understand a nd thereby invites
fraud. Ironically spea king, there is no da nger of this ha ppening with a ne ural network. Neura l networks are extre mely flexible mode ls
that combine characterist ics in a variety of ways. The ir predictive accuracy can be far superior to score cards and they don’t suffer
from sharp ―splits‖ as decision trees sometime s do.

However, it is virtua lly impossible to expla in or understand t he score that is produced for a particular application in any simple way. It
can be difficult to justify a decision t hat is made on the basis of a neural net work model. In some countries, it may even be a lega l
require ment to be able to expla in a decision and such a just ification must then be produced with addit iona l methods. A neura l
network of superior predictive power is therefore best suited for certa in behaviora l or collection scoring purposes, where th e average
accuracy of the pre diction is more important t han t he insight into the score for each part icular case. Neural network models cannot be
applied manually like scorecards or simple de cision trees, but require software to score the a pplicat ion. However, the ir use is just as
simple as that of the other mode l types.

Case study

 Scenario
An international financia l serv ices organization e ntered the consume r cre dit market in a large western European country two y ears
ago. So far, it has been operat ing with t he use of a generic score card for application scoring, but now has collected e nough
performance data to create its own custom scorecard. The company has been offering various types of consumer loa ns v ia variou s
channe ls a nd t he first custom scorecard will be a pplica ble to applicants from all channe ls. Channel-specific scorecards may later be
created as re quired.

 SAS Enterprise Miner process flow
SAS Enterprise Miner software is use d for building the scorecard. SAS Enterprise Miner e nables the analyst to access a compre hensive
collection of a nalytical tools through a graphica l use r interface. It provides a workspace onto which nodes (tool-icons) are dropped
from a tools pa lette. Nodes are the n connected to form process flow diagra ms (PFDs) that structure and document the flow of
analytica l act ivities that are carried out. The SEMMA concept (Sa mple, Explore, Modify, Mode l and Assess) serves as a guide line for
creating , process flows and nodes are grouped accordingly in the tools pa lette.

Figure 1 shows the process flow for modeling on the accepts data. All components of the f low are discussed in more detail in the
sections be low. The flow begins with reading in the development sa mple. After using the Data Partit ion node to split off part of the
sa mple for later validat ion, the flow divides into a score card branch consisting of t he Interactive Grouping node an d Score card node
and a decision tree branch consisting of the Decision Tree node. The qua lity of the scorecard and the tree are then compared on the
validation data with the Mode l Comparison node.




Figure 1: Process flow diagra m – ―accepts‖ data.

 Development sa mple
The development sa mple (input data set) is a bala nced sa mple consisting of 1500 good a nd 1500 bad accepte d applicants. ―Ba d‖ has
been def ined as hav ing been 90 days past due once. Everyone not ―bad‖ is ―good,‖ so there are no ―indeterminates.‖ A separate data
set contains the data on rejects.

The mode ling process, especially when t he validation charts are involved, re quires informat ion about the a ctual good/bad prop ortion
in t he accept population. Sa mpling we ights are used here for simulating that proportion. A we ight of 30 is a ssigned to a good

                                                                                                                                                  3
application a nd a we ight of 1 to a bad one. Thereafter a ll nodes in the process flow diagra m treat the sa mple as if it consis ted of
45,000 good a pplicat ions and 1,500 bad a pplicat ions. F igure 3 shows the distribution of good/bad after the a pplicat ion of sa mpling
weights. The bad rate is 3.23 percent.

A Data Partition node then splits a 30 perce nt validation data set a way from the development sa mple. Models will later be com pared
based on this va lidat ion data set.




Figure 2: Variable list – development sa mple.

 Classing
Classing is t he process of automat ica lly a nd/or interactive ly binning and grouping interval, nominal or ordina l input variables in order
to:

  •   Mana ge the number of attributes pe r characterist ic.
  •   Improve the predictive power of the characteristic.
  •   Select predict ive characteristics.
  •   Ma ke the We ights of Evidence —and thereby the number of points in t he
      scorecard—vary smoothly or even linearly across the attributes.

The number of points that an attribute is worth in a scorecard is determined by two factors:

  • The risk of the attribute relat ive to t he other attributes of the sa me characterist ic.
  • The relat ive contribution of t he characterist ic to the overall score.

The relative risk of the attribute is determine d by its ―We ight of Ev idence.‖ The contribut ion of the characteristic is determined by its
coefficient in a logist ic re gression (see se ction Logistic Regression be low).

The Weight of Evide nce of an attribute is defined as the logarithm of the rat io of the proport io n of goods in the attribute over the
proportion of bads in the attribute. High negat ive values therefore correspond to high risk, high pos itive values correspond to low
risk. See Equation 1 and the middle right of Figure 3.




                                                                                                                                                 4
Equation 1: We ight of evidence.

Since an attribute’s number of points in the scorecard is proport iona l to its We ight of Evidence (see section Score Points Sc aling
below), the classing process determines how ma ny points an attribute is worth re lative to the ot her attributes of the sa me
characterist ic.

After classing has defined the attributes of a characteristic, the cha racterist ic’s predict ive power (i.e., its ability to se parate high risks
from low risks) can be assessed with t he so-called Informat ion Value measure. This will aid the select ion of characterist ics for
inclusion in the score card. The Informat ion Va lue is the we ighted sum of t he We ights of Evidence of the characterist ic’s attr ibutes.
The sum is weighted by the difference between the proport ion of goods and the proport ion of bads in the respective attribute.




Equation 2: Information va lue.

The Informat ion Value should be greater than 0.02 for a characterist ic to be considere d for inclusion in the scorecard. Informat ion
Values lower than 0.1 ca n be considered wea k, sma ller than 0.3 medium and sma ller tha n 0.5 strong. If the Information Va lue is
greater than 0.5, the characterist ic may be over-predict ing, meaning that it is in some form triv ially re lated to the good/bad
informat ion.

Classing in SAS Enterprise Miner takes place in the Interactive Grouping node. This node has been specifica lly developed for credit
scoring applications. F igure 3 shows a screenshot for t he grouping of the interval-sca led input variable ―a ge.‖ The chart on the top
left shows the ranges that define the grouping overla id on a histogra m of the distribution. When the variable ―age‖ is se lect ed, a
splitting algorit hm automat ica lly suggests a grouping, which then can be modifie d in a variety of ways manua lly. Whenever a c hange
is made, the statistics that describe the current grouping are updated.

Those statistics include the distribution of the attributes (bottom left) and the We ight of Evidence and bad rate per attribute (middle
right). These numbers can a lso be read from a table (bottom right). The Information Value is a lso updated as the grouping is
modified. The grouping of nomina l and ordina l varia bles is performed similarly, respe cting the specific differences that are implied by
the measure ment levels. F or exa mple, a group of ordinal value s ca n only be merged with a ne ighboring group, whe reas nominal
values ca n be freely moved between groups.

There is no single criterion t hat indicates when a grouping ca n be considered sat isfactory. A linear or at least a monotone increase or
decrease of the We ights of Ev idence is often what is desired in orde r for t he scorecard to appear plausible. So me a nalysts will a lways
include only those cha racterist ics where a se nsible regrouping can achieve this. Others may consider a smooth variation sufficient ly
plausible and would include a non-monotone chara cteristic such as ―income,‖ where risk is high for both high and low incomes, but
low for medium incomes, provided t he Information Va lue is high enough.

In our case, we chose the characteristics ―Age,‖ ―Time on the Job,‖ ―EC Card Holder,‖ ―Customer Status,‖ ―Income‖ and ―Number of
Persons in the Household.‖ All of t hese have a n Informat ion Va lue greater t han 0.1. For some of the varia bles, the suggested
groupings were ma nua lly modified to smooth the Weights of Evide nce charts. The ―Income ‖ characteristic was intentionally include d
as a non-monotone cha racterist ic.




                                                                                                                                                     5
Figure 3: Interactive grouping node – grouping an interval variable logistic regression.

After the relative risk a cross attributes of the sa me chara cteristic ha s bee n quantif ied, a logistic regression ana lysis now determines
how to we igh the characteristics aga inst each other. The Scorecard node receives one input variable for ea ch characteristic. This
variable conta ins as values the We ights of Ev idence of the chara ct eristic’s attributes (see Ta ble 1 for an exa mple of Weight of
Evidence coding). Note that Weight of Evide nce coding is different from dummy variable coding, in that single attributes are not
weighted aga inst each other independe ntly, whereas whole characte rist ics are weighted, thereby preserving the relat ive risk structure
of the attributes as determined in t he classing stage .




Table 1: Weight of evidence coding.

A variety of further select ion methods (forward, backward, stepwise ) ca n be use d in the Scorecard node to eliminate redundant
characterist ics. In our case, we use a simple re gression. Ta ble 2 shows the values of the regression coefficients. In the fol lowing step,
these values are multiplied with the Weights of Evidence of the attributes to form the basis for the score points in t he scorecard.




Table 2: Para meter estimates.

 Score points sca ling
For each attribute, its We ight of Evidence and the regression coefficient of its cha racterist ic could now be multiplied to give the score
points of the attribute. An applicant’s total score would then be proportional to the logarit hm of the predicted bad/good odd s of t hat
applicant.

However, score points are commonly sca led linearly to ta ke more friendly (integer) va lues a nd to conform to industry or company
standards. We scale the points such that a total score of 600 points corresponds to good/bad odds of 50 to 1 a nd that an incr ease of
the score of 20 points corresponds to a doubling of t he good/bad odds. For the derivation of the scaling rule that transforms the
score points of each attribute see Equations 3 and 4. The scaling rule is imple me nted in the Scorecard node (see F igure 1), w here it
can be easily para meterize d. The resulting scorecard is output as a table and is shown in Table 3.

Note how t he score points of the various characterist ics cover different ranges. The score points develop smoothly and —with the
exception of the ―Income ‖ variable—also monotonica lly across the attributes.




                                                                                                                                                 6
Equation 3: Score points sca ling.




Equation 4: Calculation of scaling para meters.




Table 3: Score card.

 Scorecard assessme nt
The Scorecard node produces various charts and measures that he lp assess the qua lity of the scorecard. As a f irst insight into the
usefulness of the score card, a score distribut ion chart shows the range of the score, which score bands are most frequent, if the
distribution is approx imately normal and if outliers exist.

Various measures and charts are then used to evaluate the discriminatory power of the scorecard. These charts ana lyze the
scorecard’s ability to separate the good from the bad cases by the ir scores. Measures include the Kolmorogov-Smirnoff (KS) statistic,
the Gini co- efficient and the area under the ROC chart (AUROC). Corresponding to the se measures, the KS chart, the Capture d Bad
chart and the ROC chart are inspected (see F igures 4 – 6).




                                                                                                                                         7
Figure 4: The KS chart shows the difference between the cumulative distributions of ―goods‖ a nd ―bads.‖ The maximum value is the
well-known KS statistic.




Figure 5: The ROC chart shows how well the model is able to be specific (catch only ―ba ds‖) and se nsit ive (catch a ll ―bads‖)
simultaneously. Sensit ivity and 1-Specificity are displayed for various cut-off values. The more the chart be nds to the top left, the
better.




Figure 6: The Captured Bad Cha rt shows t he proportion of all ―bads‖ concentrated in each decile of applicants, ordered by sco re from
worst to best. The more the chart bends to the top left, the better. The area unde r the chart corresponds to the Gini statist ic.

In application scoring, trade-off charts are used that show how the approval rate and the bad rate in the accepts depe nd on t he cut-
off score. Good scorecards enable the choice of a cut- off that corresponds to a re lative ly high approval rate and a re latively low bad
rate.




                                                                                                                                             8
Figure 7: The Trade-off chart displays approva l rate and bad rate aga inst cut-off score for the current mode l. Horizontal lines show
values of t he prev iously use d mode l and cut-off and are used to show the expected benefit from updat ing the model.

Finally, an e mpirical odds plot is used to evaluate the calibrat ion of the scorecard.




Figure 8: Act ual against predicted odds.

The chart plots the a ctual odds values as they are found in the validation data aga inst score band. This chart is overlaid with a chart
of the values that are pre dicted by t he scorecard. The chart t hus determines those score bands where the score card is or is n ot
sufficient ly accurate.

 Decision tree
After having gone through the process of building a scorecard model, let’s take a step back and build a decision tree mode l instead.
As previously discussed, this mode l type can often have superior performa nce, beca use it exploits interactions betwee n variables. It
defines se gments of extre me ly high or extre mely low ba d rates and can thus give surprising ne w insights into the risk structu re of the
population.

The decision tree algorithm is called from the Tree node. It begins by splitting the input variables into groups in much in t he sa me
way as the Interact ive Grouping node does

However, then it goes on to choose the most predictive variable an d to group the applicants into segments according to t he split of
that variable. It then continues re cursively in ea ch segme nt of applicants to split the variables, choose the most pre dict ive one and
group the applicants. This process continues until no further part itioning see ms useful. F ina lly, some of t he termina l sub-segments
(leaves) are me rged back toget her aga in (pruned) in order to optimize the tree.




                                                                                                                                                9
Figure 9: Decision tree.

Figure 9 shows the resulting tree structure for our case. The most important variable is age, with applica nts younger t han 28 having
an elevated credit risk. F urthermore, a mong the young, those wit h no cre dit cards and no own vehicle have the highest risk of a ll
applicants. For those with a car, the time on the job becomes important. In the group of older applicants, customer status de livers
important informat ion.

 Mode l comparison
After building both a scorecard and a decision tree mode l, we now want to compare t he quality of the two models. The model
comparison node is used for that purpose. F igure 10 shows the ROC charts of both mode ls on the validation data.




Figure 10: ROC chart.

According to this chart, the difference in quality of the two mode ls is minima l. This is a lso confirmed by comparing fit statistics, such
as KS, Gini and AUR OC. The greater f lexibility of a decision tree model has not shown to be important in this case. We theref ore
prefer the scorecard as our mode l of choice, as it is use d in practice today and because the tabular format is part icularly easy to
understand and expla in.




                                                                                                                                               10
 Reject inference
The a pplicat ion scoring models we have built so far, even t hough we have done everything corre ctly, still suffer from a funda me ntal
bias. They have been built based on a population that is structura lly different from t he population to which they are suppose d to be
applied. All of the exa mple applications in the deve lopment sa mple are applications that have been acce pted by the old generic
scorecard that has been in place during the past two years.

This is so be cause only for those acce pted applications is it possible to evaluate the ir performa nce a nd to define a good/bad variable.
However, the through-the-door population that is supposed to be scored is composed of a ll applicants, those that would have been
accepted and those that would have been re jected by the old scorecard. Not e that t his is only a proble m for applicat ion scoring, not
for behaviora l scoring.

As a part ial re medy to this funda me ntal bias, it is common pra ctice to go through a process of reje ct inference. The idea of this
approach is to score the data that is reta ined of the rejected applications with the mode l that is built on the accepted applications.
Next, rejects are classified as inferred goods or inferred bads and are added to the accepts data set t hat contains the a ctua l good a nd
bad. This augmented data set t hen serves as the input data set of a se cond modeling run. In the case of a scorecard model, this
involves the readjust me nt of the classing and the re calculat ion of the regression coefficients.

Instead of using a hard cut-off score for classifying the reje cts, one can instead a lso add the sa me reject twice, once as an inferred
good a nd once as a n inferred bad, but adjust the corresponding sa mpling weight by multiplying it wit h the predicted probability of
being good or ba d respectively. The rationa le behind this approach is that a hard cut-off for reject inference would be a n arbitrary
decision that biases the augme nted data set. Since t he cut -off for eventually accepting or reje cting an application follows from cost -
revenue trade-off considerations that are base d on the fina l scorecard, choosing a cut-off based on a preliminary scorecard see ms
unfounde d. Figure 11 shows a mode ling f low that include s a reject inference process.




Figure 11: Process flow diagra m – ―rejects‖ data.

 Summary
This pa per has illustrated how SA S Enterprise Miner s oftware is used to build credit scoring mode ls. In the introduct ion, it discussed
the benefits of performing credit scoring and the advantages of building credit scoring mode ls in -house and with SAS Enterprise
Miner. It went on to discuss the advantages a nd disa dvantages of three important mode l types, the scorecard, t he de cision tree and
the neura l network. Finally, it presented a case study where an applicat ion scoring mode l is built wit h SA S Enterprise Miner, starting
with reading the development sa mple, through classing and se lecting characteristics, f itting a regression mode l, ca lculating score
points, assessing scorecard qua lity (in comparison to a decision tree mode l built on the sa me sa mple ) and going t hrough a rej ect
inference process to a rrive at a mode l for scoring the new customer a pplica nt population.

The study has been prese nted in the hope that the reader will come to appreciate the efficiency ga ins t hat SAS Enterprise Min er
software brings to the modeling process. Efficiency is ga ined by aut omating the mode ling process, but even more so by prov iding the
analyst with a gra phical user interface that structures, connects and documents the flow of act ivities that are carried out.

Thus, if changes or variations need to be introduce d, the overa ll process is a lready defined, a nd one doesn’t have to start from
scratch. (For exa mple, if a similar ana lysis needs to be carried out on different data, for a different purpose a nd by a new ana lyst, the
process is still easy to a pply.) Process flows ena ble the company to imple ment its tradit iona l way of working but a lso to experiment
with new approaches and to compa re the results. The env ironment is f lexible and open e nough to enable the ana lyst to interact with
the data and the models and, t herefore, to bring in fully his or her domain expert ise.

In summa ry, one should t hink of SAS Enterprise Miner more as a steering wheel than as a n a utopilot. Because SAS Enterprise Mi ner
is only one part of t he whole SAS syste m for informat ion delivery, there will be no bottlene ck when deploying individual models or
when automat ing the whole score creation and application process in a n enterprise wide solut ion.




                                                                                                                                                11