california auto accident attorneys by jolinmilioncherie

VIEWS: 0 PAGES: 24

									  Free and Cheap Sources of
        External Data
CAS 2007 Predictive Modeling Seminar



    Louise Francis, FCAS, MAAA
       Francis Analytics and Actuarial Data Mining, Inc.

           Louise_francis@msn.com
             www.data-mines.com
              Objectives
• Information sharing
• Introduce some useful sources of data to
  augment company internal databases
• Show examples of applications using
  external data
       Why Augment Data?
• For small companies, new lines of
  business, internal data may not be
  sufficient
• Add variables (i.e, demographic and
  economic) that are not in data
  Some Kinds of External Data
• Demographic
• Geographic
• Economic
  – Unemployment rate, avg wage, etc
  – Financial Market
• Insurance data
• Occupational
• Weather
        Zip Code Level Data
• Census bureau web site, www.census.gov
  has a wealth of information
• May require some processing effort to put
  into useful format for analysis
• For a small fee there are vendors who pre-
  process some of the useful data
• One of them is zip-codes.com
Zip-codes.com
        Some Useful Variables
•   Average Income
•   Population
•   Average house value
•   # people per house
•   Latitude, longitude
    – Use to compute distances
• City, county
Distance formula
                           The Data


ZipCode   PrimaryRecord ZipCodePopulation HouseholdsPerZipcode WhitePopulation
90071     P                              6              0.0000            3.00
90010     P                         1,943             996.0000          780.00
90014     P                         3,518           2,587.0000          868.00
91608     P                              0              0.0000            0.00
90015     P                        15,134           5,339.0000        4,485.00
     California Auto Data by ZIP
•   BI Exposures
•   BI Losses
•   BI Claims
•   PD Exposures
•   PD Losses
•   PD Claims
               CAARP Data
•   CAARP data
•   California Auto Assigned Risk Plan
•   Collected by state
•   Aggregated data
•   Request from Statistical Analysis Division
    of department
California Proposed Changes to
         Territory Rating
Effect of Change by County
Effect of Change by Pure Premium
              Group

Ins truction Se t 1 Pct Change Instruction Set 2 Pct Change Ins truction Se t 3
                     Pct Change * Pe rce ntile Group of PPBI

    Mean
                               Instruc tion Set   Instruc tion Set   Instruc tion Set
    Percentile Group of PPBI    1 Pc t Change      2 Pc t Change      3 Pc t Change
    1                                  7.889%             9.516%           18.600%
    2                                  5.143%             6.416%           11.250%
    3                                  2.381%             3.050%             5.507%
    4                                  -.124%             -.225%             -.125%
    5                                 -4.240%            -5.448%            -8.777%
    Total                              2.136%             2.575%             5.110%
Effect of Change by Average
         House Value
Ins truction Se t 1 Pct Change Instruction Se t 2 Pct Change Ins tr uction
      Set 3 Pct Change * Per ce ntile Group of Average House V alue

   Mean
   Percentile Group of   Instruc tion Set   Instruc tion Set   Instruc tion Set
   A verageHouseV alue    1 Pc t Change      2 Pc t Change      3 Pc t Change
   1                             3.308%             4.101%             8.126%
   2                             2.117%             2.986%             5.336%
   3                             2.393%             3.121%             5.478%
   4                             2.936%             3.603%             6.100%
   5                             2.369%             2.945%             4.598%
   Total                         2.739%             3.498%             6.411%
    Effect of Change by Average
               Income

Ins truction Se t 1 Pct Change Instruction Se t 2 Pct Change Instr uction Set
         3 Pct Change * Per ce ntile Group of Incom e PerHous ehold

    Mean
    Percentile Group of    Instruc tion Set   Instruc tion Set   Instruc tion Set
    IncomePerHousehold      1 Pc t Change      2 Pc t Change      3 Pc t Change
    1                              3.450%             4.203%             8.755%
    2                              3.001%             4.046%             7.119%
    3                              2.298%             2.973%             5.276%
    4                              1.615%             2.241%             3.384%
    5                              2.518%             3.080%             4.278%
    Total                          2.739%             3.498%             6.411%
The Data used for Fraud
        Model
Described in “Distinguishing the Forest From
the Trees”, Derrig and Francis, 2005 CAS
Winter Forum
The Fraud Surrogates used as
    Dependent Variables
• Independent Medical Exam (IME)
  requested
• Special Investigation Unit (SIU) referral
  – (IME successful)
  – (SIU successful)

• Data: Detailed Auto Injury Claim Database
  for Massachusetts
• Accident Years (1995-1997)
            Predictor Variables
• Claim file variables
   – Provider bill, Provider type
   – Injury
• Derived from claim file variables
   – Attorneys per zip code
   – Docs per zip code
• Using external data
   – Average household income
   – Households per zip
Neural Network Ranking of
        Variables
  Variable              Rank   Sensitivity Statistic   Importance
  Health Insurance      1                   1.01335    100%
  Provider 2 Bill       2                   1.00987    74%
  Provider 1 Bill       3                   1.00681    51%
  Territory             4                   1.00652    49%
  Attorneys/Zip         5                   1.00507    38%
  Injury Type           6                   1.00396    30%
  Report Lag            7                   1.00303    23%
  Provider 2 Type       8                   1.00272    20%
  Provider 1 Type       9                   1.00210    16%
  Tretment Lag          10                  1.00198    15%
  Households/Zip        11                  1.00156    12%
  Attorney              12                  1.00051    4%
  Emergency Treatment   13                  1.00034    3%
  Claimants per City    14                  1.00025    2%
  Providers/Zip         15                  1.00024    2%
  Age                   16                  1.00018    1%
  Providers per City    17                  1.00016    1%
  Distance              18                  1.00010    1%
Variable Importance for IME
 Requested for 3 Methods
Rank       Treenet            MARS           S-Plus Neural
  1     Provider 2 Bill   Health Insurance  Health Insurance
  2     Attorneys/Zip      Provider 2 Bill   Provider 2 Bill
  3        Territory        Injury Type      Provider 1 Bill
  4    Health Insurance     Report Lag          Territory
  5      Injury Type       Provider 1 Bill   Attorneys/Zip
  6     Provider 1 Bill    Tretment Lag       Injury Type
  7    Provider 1 Type   Providers per City   Report Lag
  8      Report Lag     Avg Household Price Provider 2 Type
  9        Attorney           Territory     Provider 1 Type
 10          Age              Attorney       Tretment Lag
 11    Provider 2 Type     Providers/Zip    Households/Zip
 Variable Importance (IME)
Based on Average of Methods
                       Important Variable Summarizations for IME
                           Tree Models, Other Models and Total
                                                  Total      Tree         Other
                                                  Score      Score        Score
                              Variable Total
   Variable                   type      Score     Rank       Rank         Rank
   Health Insurance           F            16529          1           2            1
   Provider 2 Bill            F            12514          2           1            3
   Injury Type                F            10311          3           3            2
   Territory                  F             5180          4           4            7
   Provider 2 Type            F             4911          5           6            4
   Provider 1 Bill            F             4711          6           5            5
   Attorneys Per Zip          DV            2731          7           7           14
   Report Lag                 DV            2650          8          10            8
   Treatment Lag              DV            2638          9          13            6
   Claimant per City          DV            2383         10          12            9
   Provider 1 Type            F             1794         11           9           13
   Providers per City         DV            1708         12          11           11
   Attorney                   F             1642         13           8           16
   Distance MP1 Zip to Clt
   Zip                        DV            1134         14          18           10
   AGE                        F             1048         15          17           12
   Avg. Household
   Price/Zip                  DM              907        16          16           15
   Emergency Treatment        F               660        17          14           18
   Income Household/Zip       DM              329        18          15           20
   Providers/Zip              DV              288        19          20           17
   Household/Zip              DM              242        20          19           19
   Policy Type                F                 4        21          21           21
 Trends Using External Information
• People still rely on Masterson’s indices and other indices based on
  the CPI
• Shortcomings
    –   Hedonic adjustment
    –   Substitution
    –   Imputed rental cost
    –   Geometric chaining
    –   See www.shadowstats.com or Getting Prices Right by Economic Policy
        Institute and Dean Baker
• Insurance inflation has typically been much higher than these
  indications
• Many need reliable trend indications on smaller segments of their
  data
• Trend is another weak link in the modeling process
Questions?

								
To top